utils
Utilities and additional functions for electropherogram analysis.
Author: Anja Hess
Date: 2025-NOV-10
Functions
|
For validating that the ANOVA is used in normally ditr scenarios |
|
While Shapiro's test does not confirm the sample |
|
Function to estimate the mean size of a patient/samples' DNA |
|
Compute basic distribution statistics for each sample. Includes: skewness, entropy, AUC |
|
Function to create a composite from multiple image outputs (Multi-image processing) |
|
Function to transfer wide dataframe to long format |
|
Beta: a function that in the future will allow help handling resulting "gaps" when using multiple ladders within the same signal table. |
Module Contents
- utils.vartest(stats_groups, alpha=0.05)
For validating that the ANOVA is used in normally ditr scenarios :param stats_groups: :param alpah: :return:
- utils.normality(stats_groups, alpha=0.05)
While Shapiro’s test does not confirm the sample stems from a normal distribution, we can at least reject this hypothesis and argue for the necessity to perform a non-parametric test.
- Parameters:
stats_groups
alpha – float
- Returns:
- utils.mean_from_histogram(df, unit='', size_unit='', sample_unit='sample')
Function to estimate the mean size of a patient/samples’ DNA fragments (in base pairs) based on the fluorescence signal table. Strategy is to create a histogram and next infer the metrics.
- Parameters:
df – pandas.DataFrame
unit – str, usually normalized fluorescence unit
size_unit – str, fragment size unit (base pairs)
- Returns:
float, average fragment size
- utils.distribution_stats(df, save_dir='', unit='normalized_fluorescent_units', size_unit='bp_pos', sample_unit='sample')
Compute basic distribution statistics for each sample. Includes: skewness, entropy, AUC
- Parameters:
df – pandas dataframe
save_dir – str
unit – str
size_unit – str
metric_unit – str
- Returns:
basic stats dataframe
- utils.merge_tables(signal_tables, save_dir='', meta_dict=False)
Function to create a composite from multiple image outputs (Multi-image processing) :param signal_tables: list of directories to signal tables created from gel images :param save_dir: str :return: will save the composite to
- utils.wide_to_long(df, id_var='pos', var_name='sample', value_name='value')
Function to transfer wide dataframe to long format
- Parameters:
df – pandas.DataFrame in wide format
id_var – str, the column of the wide dataframe containing the id variable
var_name – str, the new column in the long dataframe containing the variable name
value_name – str, the new column in the long dataframe containing the value
- Returns:
pandas.DataFrame
- utils.integrate(df, ladders_present='')
Beta: a function that in the future will allow help handling resulting “gaps” when using multiple ladders within the same signal table.
NOTE: Not implemented yet.
- Parameters:
df – pandas dataframe
ladders_present – list of strings
- Returns:
a new pandas dataframe that does not have nan values despite multiple ladders