analyze_electrophero
====================

.. py:module:: analyze_electrophero

.. autoapi-nested-parse::

   Main functions for electropherogram analysis. 

   Author: Anja Hess 

   Date: 2025-AUG-06 


Attributes
----------

.. autoapisummary::

   analyze_electrophero.script_path
   analyze_electrophero.maindir


Functions
---------

.. autoapisummary::

   analyze_electrophero.peak2basepairs
   analyze_electrophero.split_and_long_by_ladder
   analyze_electrophero.parse_meta_to_long
   analyze_electrophero.remove_marker_from_df
   analyze_electrophero.nuc_fractions
   analyze_electrophero.run_stats
   analyze_electrophero.marker_and_normalize
   analyze_electrophero.epg_stats
   analyze_electrophero.epg_analysis


Module Contents
---------------

.. py:data:: script_path

   Local directory of DNAvi analyze_electrophero module


.. py:data:: maindir

   Local directory of DNAvi (MAIN)


.. py:function:: peak2basepairs(df, qc_save_dir, y_label=YLABEL, x_label=XLABEL, ladder_dir='', ladder_type='custom', marker_lane=0)

   Function to infer ladder peaks from the signal table and annotate those to     base pair positions with the user-provided ladder-file.

   :param df: pandas dataframe
   :param qc_save_dir: directory to save qc results
   :param y_label: str, new name for the signal intensity values
   :param x_label: str, new name for the position values
   :param ladder_dir: str, path to where the ladder is located
   :param ladder_type: str, if changed to "custom" the minimum peak     height can be adjusted with the constants module.
   :return: a dictionary annotating each peak to a base pair position


.. py:function:: split_and_long_by_ladder(df)

   This function allows to handle multiple ladder types in one     input dataframe while transferring the data into a long format     required for plotting. The base pair position for each set of     DNA samples is assigned as defined by previous marker interpolation.

   :param df: pandas.DataFrame (wide)
   :return: pandas.DataFrame (long)


.. py:function:: parse_meta_to_long(df, metafile, sample_col='sample', source_file='', image_input=False)

   Function to parse the user-provided metadata and transfer to long format

   :param df: pandas.DataFrame (wide)
   :param metafile: str, csv path
   :param sample_col: str, column name
   :param source_file: str, csv path to where the source file shall be located
   :param image_input: bool, whether this dataframe was previously generated from an image file
   :return: the source data file is written to disk (.csv)


.. py:function:: remove_marker_from_df(df, peak_dict='', on='', correct_for_variant_samples=False)

   Function to remove marker from dataframe including a halo, meaning     a defined number of base pairs around the marker band specified in the     constants module

   :param df: pandas.DataFrame
   :param peak_dict: dict, previously generated with peak2basepairs
   :param on: str denoting column based on which dataframe will be cut
   :param correct_for_variant_samples: bool - if this option is chosen, each sample will
   be checked individually for end of the marker peaks and cropped based on this information.
   Defaults to False, meaning that the marker halo is estimated from the first sample.
   :return: pd.DataFrame, cleared from marker-associated data points


.. py:function:: nuc_fractions(df, unit='', size_unit='', nuc_dict=NUC_DICT)

   Estimate nucleosomal fractions (percentages) of     a sample's cfDNA based on pre-defined base pair ranges.

   :param df: pandas.DataFrame
   :param unit: str, usually normalized fluorescence unit
   :param size_unit: str, fragment size unit (base pairs)
   :return: pd.Dataframe of nucleosomal fractions


.. py:function:: run_stats(df, variable='', category='', paired=False, alpha=0.05, region_id='region_id')

   Function to perform statistical tests (parametric or
   non-parametric) infer significance for the difference     in mean base pair fragment size for patients/samples from different groups

   :param df: pandas.DataFrame
   :param variable: continuous variable
   :param category: categorical variable
   :param paired: boolean
   :return: statistics per group in a dataframe


.. py:function:: marker_and_normalize(df, peak_dict='', include_marker=False, normalize=True, normalize_to=False, correct=False)

   Function to normalize the raw DNA fluorescence intensity     to a value between 0 abd 1.

   :param df: pandas.DataFrame
   :param peak_dict: dict, previously generated with peak2basepairs
   :param include_marker: bool, whether to include markers
   :return: pd.DataFrame, now with normalized DNA fluorescence intensity


.. py:function:: epg_stats(df, save_dir='', unit='normalized_fluorescent_units', size_unit='bp_pos', metric_unit='value', nuc_dict=NUC_DICT, paired=False, region_id='region_id', cut=False)

   Compute and output basic statistics for DNA size distributions

   :param df: pandas.DataFrame
   :param save_dir: string, where to save the statistics to
   :param unit: string (y-variable)
   :param size_unit: string (x-variable)
   :param paired: bool, whether measurements were paired
   :return: will save three dataframes as .csv files in stats     directory: basic_statistics.csv, peak_statistics.csv,     group_statistics_by_CATEGORICAL-VAR.csv)


.. py:function:: epg_analysis(path_to_file, path_to_ladder, path_to_meta, run_id=None, include_marker=False, image_input=False, save_dir=False, marker_lane=0, nuc_dict=NUC_DICT, paired=False, normalize=True, normalize_to=False, correct=False, cut=False)

   Core function to analyze DNA distribution from a signal table.

   :param path_to_file: str, path where the signal table is stored
   :param path_to_ladder: str, path to where the ladder file is stored
   :param path_to_meta: str, path to metadata file
   :param run_id: str, name for the analysis, based on user input or name of     the signal table file
   :param include_marker: bool, whether to include the marker in the analysis
   :param image_input: bool, whether to the signal table was generated based on an image
   :param save_dir: bool or str, where to save the statistics to. Default: False
   :param paired: bool, whether to perform a paired statistical analysis
   :param normalize: bool, whether to perform min-max normalization
   :param normalize_to: str of False, name of sample to which all other samples are normalized to
   :return: run analysis and plotting functions, create multiple outputs in the result folder