utils

Utilities and additional functions for electropherogram analysis.

Author: Anja Hess

Date: 2025-NOV-10

Functions

vartest(stats_groups[, alpha])

For validating that the ANOVA is used in normally ditr scenarios

normality(stats_groups[, alpha])

While Shapiro's test does not confirm the sample

mean_from_histogram(df[, unit, size_unit, sample_unit])

Function to estimate the mean size of a patient/samples' DNA

distribution_stats(df[, save_dir, unit, size_unit, ...])

Compute basic distribution statistics for each sample. Includes: skewness, entropy, AUC

merge_tables(signal_tables[, save_dir, meta_dict])

Function to create a composite from multiple image outputs (Multi-image processing)

wide_to_long(df[, id_var, var_name, value_name])

Function to transfer wide dataframe to long format

integrate(df[, ladders_present])

Beta: a function that in the future will allow help handling resulting "gaps" when using multiple ladders within the same signal table.

Module Contents

utils.vartest(stats_groups, alpha=0.05)

For validating that the ANOVA is used in normally ditr scenarios :param stats_groups: :param alpah: :return:

utils.normality(stats_groups, alpha=0.05)

While Shapiro’s test does not confirm the sample stems from a normal distribution, we can at least reject this hypothesis and argue for the necessity to perform a non-parametric test.

Parameters:
  • stats_groups

  • alpha – float

Returns:

utils.mean_from_histogram(df, unit='', size_unit='', sample_unit='sample')

Function to estimate the mean size of a patient/samples’ DNA fragments (in base pairs) based on the fluorescence signal table. Strategy is to create a histogram and next infer the metrics.

Parameters:
  • df – pandas.DataFrame

  • unit – str, usually normalized fluorescence unit

  • size_unit – str, fragment size unit (base pairs)

Returns:

float, average fragment size

utils.distribution_stats(df, save_dir='', unit='normalized_fluorescent_units', size_unit='bp_pos', sample_unit='sample')

Compute basic distribution statistics for each sample. Includes: skewness, entropy, AUC

Parameters:
  • df – pandas dataframe

  • save_dir – str

  • unit – str

  • size_unit – str

  • metric_unit – str

Returns:

basic stats dataframe

utils.merge_tables(signal_tables, save_dir='', meta_dict=False)

Function to create a composite from multiple image outputs (Multi-image processing) :param signal_tables: list of directories to signal tables created from gel images :param save_dir: str :return: will save the composite to

utils.wide_to_long(df, id_var='pos', var_name='sample', value_name='value')

Function to transfer wide dataframe to long format

Parameters:
  • df – pandas.DataFrame in wide format

  • id_var – str, the column of the wide dataframe containing the id variable

  • var_name – str, the new column in the long dataframe containing the variable name

  • value_name – str, the new column in the long dataframe containing the value

Returns:

pandas.DataFrame

utils.integrate(df, ladders_present='')

Beta: a function that in the future will allow help handling resulting “gaps” when using multiple ladders within the same signal table.

NOTE: Not implemented yet.

Parameters:
  • df – pandas dataframe

  • ladders_present – list of strings

Returns:

a new pandas dataframe that does not have nan values despite multiple ladders