utils ===== .. py:module:: utils .. autoapi-nested-parse:: Utilities and additional functions for electropherogram analysis. Author: Anja Hess Date: 2025-NOV-10 Functions --------- .. autoapisummary:: utils.vartest utils.normality utils.mean_from_histogram utils.distribution_stats utils.merge_tables utils.wide_to_long utils.integrate Module Contents --------------- .. py:function:: vartest(stats_groups, alpha=0.05) For validating that the ANOVA is used in normally ditr scenarios :param stats_groups: :param alpah: :return: .. py:function:: normality(stats_groups, alpha=0.05) While Shapiro's test does not confirm the sample stems from a normal distribution, we can at least reject this hypothesis and argue for the necessity to perform a non-parametric test. :param stats_groups: :param alpha: float :return: .. py:function:: mean_from_histogram(df, unit='', size_unit='', sample_unit='sample') Function to estimate the mean size of a patient/samples' DNA fragments (in base pairs) based on the fluorescence signal table. Strategy is to create a histogram and next infer the metrics. :param df: pandas.DataFrame :param unit: str, usually normalized fluorescence unit :param size_unit: str, fragment size unit (base pairs) :return: float, average fragment size .. py:function:: distribution_stats(df, save_dir='', unit='normalized_fluorescent_units', size_unit='bp_pos', sample_unit='sample') Compute basic distribution statistics for each sample. Includes: skewness, entropy, AUC :param df: pandas dataframe :param save_dir: str :param unit: str :param size_unit: str :param metric_unit: str :return: basic stats dataframe .. py:function:: merge_tables(signal_tables, save_dir='', meta_dict=False) Function to create a composite from multiple image outputs (Multi-image processing) :param signal_tables: list of directories to signal tables created from gel images :param save_dir: str :return: will save the composite to .. py:function:: wide_to_long(df, id_var='pos', var_name='sample', value_name='value') Function to transfer wide dataframe to long format :param df: pandas.DataFrame in wide format :param id_var: str, the column of the wide dataframe containing the id variable :param var_name: str, the new column in the long dataframe containing the variable name :param value_name: str, the new column in the long dataframe containing the value :return: pandas.DataFrame .. py:function:: integrate(df, ladders_present='') Beta: a function that in the future will allow help handling resulting "gaps" when using multiple ladders within the same signal table. NOTE: Not implemented yet. :param df: pandas dataframe :param ladders_present: list of strings :return: a new pandas dataframe that does not have nan values despite multiple ladders