data_checks

Functions to assure input files for DNAvi are correctly formatted

Author: Anja Hess

Date: 2025-JUL-23

Functions

check_marker_lane(input_nr)

Quickly check if the number for marker lane is pos

detect_delim(file[, num_rows])

Detect delimiter from input table with Sniffer

check_name(filename)

Function to generate secure filename from filename

check_input(filename)

Function to check if the input exists

check_file(filename)

Function to check if file is correctly formatted

check_ladder(filename)

Function to check if the ladder is formatted correctly

check_meta(filename)

Check if the metadata file is formatted correctly

check_config(filename)

Check if the config file is formatted correctly

compute_nuc_intervals(start[, step, total_steps, prefixes])

Compute interpretable nucleosomaal intervals in format them

check_interval(interval_string[, max_val])

Check if the config file is formatted correctly

generate_meta_dict(meta_path[, files])

A function to conveniently parse metadata for multiple files

Module Contents

data_checks.check_marker_lane(input_nr)

Quickly check if the number for marker lane is pos :param input_nr: int :return: int if check passed

data_checks.detect_delim(file, num_rows=1)

Detect delimiter from input table with Sniffer

Parameters:
  • file – str, path to input file

  • num_rows – int, number of rows in file

Returns:

str, detected delimiter

data_checks.check_name(filename)

Function to generate secure filename from filename

Parameters:

filename – str

Returns:

improved file name

data_checks.check_input(filename)

Function to check if the input exists

Parameters:

filename – str

Returns:

raise error if file does not exist

data_checks.check_file(filename)

Function to check if file is correctly formatted

Parameters:

filename – str

Returns:

raise error if file is incorrectly formatted

data_checks.check_ladder(filename)

Function to check if the ladder is formatted correctly

Parameters:

filename – str

Returns:

raise error if file does not have correct format

data_checks.check_meta(filename)

Check if the metadata file is formatted correctly

Parameters:

filename – str, path to metadata file

Returns:

raise error if file does not have correct format

data_checks.check_config(filename)

Check if the config file is formatted correctly

Parameters:

filename – str, path to config file

Returns:

raise error if file does not have correct format

data_checks.compute_nuc_intervals(start, step=200, total_steps=10, prefixes=['Mono', 'Di', 'Tri', 'Tetra', 'Penta', 'Hexa', 'Hepta', 'Octa', 'Nona', 'Deca'])

Compute interpretable nucleosomaal intervals in format them into a common DNAvi nuc dict.

Parameters:
  • start

  • step

  • total_steps

  • prefixes

Returns:

new nuc_dict (pyhton dictionary)

data_checks.check_interval(interval_string, max_val=100000)

Check if the config file is formatted correctly

Parameters:

filename – str, path to config file

Returns:

raise error if file does not have correct format

data_checks.generate_meta_dict(meta_path, files=[])

A function to conveniently parse metadata for multiple files when handling multi-file inputs

Parameters:
  • meta_path – path to metadata file

  • files – list

Returns:

dictionary parsing the new split metadata file for each input file