The qpcr.Readers

This module provides different Reader classes that allow reading simple and complex datafiles of various architectures.

Learn more about the available Readers. If you are interested to learn more about preprocessing the datafiles, check out the Decorator tutorial.

SingleReader

The SingleReader is able to read both regular and irregular single-assay datafiles. It can also read multi-assay datafiles but requires an assay argument, specifying which assay specifically to extract from it.

id	Ct	other_data
ctrl1	5.67	…
ctrl2	5.79	…
ctrl3	5.86	…
condA1	5.34	…
…	…	…

from qpcr.Readers import SingleReader

reader = SingleReader()

myfile = "my_datafile.csv"

assay = reader.pipe( myfile )

MultiReader

The MultiReader can read irregular multi-assay datafiles and extract all assays from them either using a specific assay_pattern to find them or using decorators (check out the documentation of the qpcr.Parsers for more information).

Some meta-data here	maybe today’s date
Assay 1
id	Ct	other_data
ctrl1	5.67	…
ctrl2	5.79	…
…	…
			<- blank line here!
Assay 2
id	Ct	other_data
ctrl1	10.23	…
ctrl2	10.54	…
…	…

from qpcr.Readers import MultiReader

myfile = "my_datafile.xlsx"

reader = MultiReader()

# if we have a non-decorated file we can just use
assays = reader.pipe( myfile )

# if we have decorated our file, then we can use
assays, normalisers = reader.pipe( myfile, decorator = True )

MultiSheetReader

The MultiSheetReader is able to read irregular multi-assay datafiles that contain assays in multiple datasheets.

By default the MultiSheetReader will read all sheets in an excel file. However, you can specify a single sheet that should be read exclusively (thus turning the reader to a “MultiReader”) using the sheet argument a specific sheet can be specified for reading.

BigTableReader

The BigTableReader is able to read datafiles that store their assays in one single “big table”. It can extract all assays from that big table using either simple extraction methods or decorators depending on the type of big table (check out the documentation of the BigTableReader for more information on the types of “big tables” and how to read them).

“Vertical” BigTables

assay	id	Ct	other_data
assay 1	ctrl1	5.67	…
assay 1	ctrl2	5.79	…
…	…	…	…
assay 2	ctrl1	10.23	…
…	…	…	…

from qpcr.Readers import BigTableReader

myfile = "my_datafile.xlsx"

reader = BigTableReader()

assays, normalisers = reader.pipe(
                                    filename = file,
                                    decorator = True,

                                    # specify the kind of big table
                                    kind = "vertical",

                                    # specify which columns store
                                    # the relevant data
                                    assay_col = "Assay",
                                    id_col = "Name",
                                    ct_col = "Ct"
                            )

“Horizontal” BigTables

assay	ctrl1	ctrl2	…	other_data
assay 1	5.67	5.79	…	…
assay 2	10.23	10.54	…	…
…	…	…	…	…

from qpcr.Readers import BigTableReader

myfile = "my_datafile.xlsx"

reader = BigTableReader()

assays, normalisers = reader.pipe(
                                    filename = file,

                                    # we specify that it's a horizontal table
                                    kind = "horizontal",

                                    # we **have to** specify the
                                    # number of replicates per group
                                    replicates = 3,

                                    # we also specify the group names (because they
                                    # will not be inferrable due to the different
                                    # column names of the replicates)
                                    names = ["control", "condition A", "condition B", "condition C"],

                                    # we must also specify which column specifies the assays
                                    # NOTE: in this mode this is handled by the `id_col` argument!
                                    id_col = "Name"

                                )

“Hybrid” BigTables

id	assay 1	assay 2	other_data
ctrl	7.65	11.78	…
ctrl	7.87	11.56	…
ctrl	7.89	11.76	…
condA	7.56	11.98	…
condA	7.34	11.56	…
…	…	…	…

from qpcr.Readers import BigTableReader

myfile = "my_datafile.xlsx"

reader = BigTableReader()

assays, normalisers = reader.pipe(
                                    filename = file,

                                    # we specify that it's a horizontal table
                                    kind = "hybrid",

                                    # we **have to** specify the
                                    # number of replicates per group
                                    replicates = 3,

                                    # we also specify the group names (because they
                                    # will not be inferrable due to the different
                                    # column names of the replicates)
                                    names = ["control", "condition A", "condition B", "condition C"],

                                    id_col = "Sample",

                                    # in this setup each assay is already stored as a separate datacolumn
                                    # so we must provide either a list of the column names or use decorators
                                    ct_col = ["ActB", "HNRNPL", "SRSF11"],
                                )

# NOTE: Because we did not specify any decorators here, the normalisers list will be created but be empty!

class qpcr.Readers.Readers.BigTableReader[source]

Bases: qpcr.Readers.Readers.MultiReader

Reads a single multi-assay datafile and reads assays-of-interest and normaliser-assays based on decorators.

Input Data Files

Valid input files are multi-assay irregular csv or excel files, that specify assays as one big table containing all information together. Note that this implies that the entire data is stored in a single sheet (if using excel files).

Three possible data architectures are allowed:

Vertical Big Tables

Big Tables of this kind require three columns (any additional columns are disregarded): one specifying the assay, one specifying the replicate identifiers, and one specifying the Ct values. An additional fourth column (@qpcr) may be filled with decorators but this is not necessary in this setup.
Horizontal Big Tables

Big Tables of this kind store replicates from assays in side-by-side columns. The replicates may be labelled numerically or all have the same column header. A second column is required specifying the replicate identifier.

Note, this kind of setup requires decorators above the first replicate of each assay, as well as user-defined replicates!

Note, the column headers have to be unique to the table! Also, a word of warning with regard to replicate assays. The entries in the assay defining column must be unique! If you have multiple assays from the same gene which therefore also have the same id they will be interpreted as belonging together and will be assembled into the same qpcr.Assay object. However, this will result in differently sized Assays which will cause problems downstream when you (or a qpcr.Normaliser) try to assemble a qpcr.Results object!
Hybrid Big Tables
Big Tables of this kind store Ct values of different assays in separate side-by-side columns, but they store the replicate identifiers as a separate column. Hence, they combine aspects of vertical and horizontal Big Tables.
Note, two options exist to read this kind of setup.
A list of ct_col values can be passed which contains the column header of each assay.

The table can be decorated, in which case only decorated assays (columns) are extracted.
Please, note that the two methods of reading this table are mutually exclusive! So, if you decorate your table you cannot pass specific assay headers to the ct_col argument anymore.

parse(**kwargs)[source]: Parses the big table and extracts the individual assays.

pipe(filename: str, kind: str, id_col: str, **kwargs)[source]

A wrapper for read+parse+make_Assays

Note

This is the suggested use of the BigTableReader.

Parameters

filename (str) – A filepath to a raw data file, containing multiple assays that were decorated. Check out the documentation of the qpcr.Parsers’s to learn more about decorators.
kind (str) – Specifies the kind of Big Table from the file. This may either be "horizontal", "vertical", or "hybrid".
id_col (str) – The column header specifying the replicate identifiers (or “assays” in case of horizontal big tables).
**kwargs – Any additional columns or keyword arguments.

Returns

assays (dict or list) – Returns either the raw dictionary of dataframes returned by the Parser (if no qpcr.Assays could be made automatically) or a list of qpcr.Assay objects.
normalisers (dict or list) – Returns either the raw dictionary of dataframes returned by the Parser (if no qpcr.Assays could be made automatically) or a list of qpcr.Assay objects.

read(filename: str, kind: str, id_col: str, **kwargs)[source]

Reads a regular or irregular csv or excel datafile that contains data stored in a single big table. Files are first tried to be read regularly, if this fails, the Reader resorts to parsing to identify the relevant sections of the data.

Parameters

filename (str) – A filepath to a raw data file, containing multiple assays that were decorated. Check out the documentation of the qpcr.Parsers’s to learn more about decorators.
kind (str) – Specifies the kind of Big Table from the file. This may either be "horizontal", "vertical", or "hybrid".
id_col (str) – The column header specifying the replicate identifiers (or “assays” in case of horizontal big tables).
**kwargs – Any additional columns or keyword arguments.

class qpcr.Readers.Readers.MultiReader(filename: Optional[str] = None, **kwargs)[source]

Bases: qpcr.Readers.Readers.SingleReader, qpcr._auxiliary._ID

Reads a single multi-assay datafile and reads assays-of-interest and normaliser-assays based on decorators.

Input Data Files

Valid input files are multi-assay irregular csv or excel files, that specify assays by one replicate identifier column and one Ct value column.

Separate assay tables may be either below one another (separated by blank lines!) or besides one another (requires transpose = True), but ALL in the SAME sheet!

Assays of interest and normaliser assays must be marked using decorators.

Note

MultiReader can transform the extracted datasets directly into qpcr.Assay objects using MultiReader.make_Assays(). It will perform grouping of assays if possible but will return raw-assays if not! get will either return a dictionary of the raw dataframes or a list of ``qpcr.Assay``s.

Parameters

filename (str) – A filepath to a raw data file, containing multiple assays that were decorated. Check out the documentation of the qpcr.Parsers’s to learn more about decorators.
**kwargs – Any additional keyword arguments that should be passed to the read method which is immediately called during init if a filename is provided.

assays(which: Optional[str] = None)[source]

Parameters

which (str) – If specified it only returns the data for the specified assay. Otherwise (default) it returns all assays.

Returns

data (dict or list) – Returns either the raw dictionary of dataframes returned by the Parser (if make_Assays has not been run yet) or a list of qpcr.Assay objects.
names (list) – A list of the names of all extracted assays.

clear()[source]: Clears all the extracted data from the Reader

get(which: str)[source]

Returns the stored assays or normalisers.

Parameters: which (str) – Can be either "assays" or "normalisers" or any specific assay identifier.
Returns: data – Returns either the raw dictionary of dataframes returned by the Parser (if make_Assays has not been run yet) or a list of qpcr.Assay objects.
Return type: dict or list

make_Assays()[source]: Convert all found assays and normalisers into qpcr.Assay objects.

normalisers(which: Optional[str] = None)[source]

Parameters

which (str) – If specified it only returns the data for the specified normaliser. Otherwise (default) it returns all normalisers.

Returns

data (dict or list) – Returns either the raw dictionary of dataframes returned by the Parser (if make_Assays has not been run yet) or a list of qpcr.Assay objects.
names (list) – A list of the names of all extracted normalisers.

parse(**kwargs)[source]

Extracts the datasets (assays) from the read datafile.

Parameters: **kwargs – Any additional keyword arguments that should be passed to the qpcr.Parsers’’s parse method that extracts the datasets.

pipe(filename: str, **kwargs)[source]

A wrapper for read+parse+make_Assays

Note

This is the suggested use of MultiReader. If a directory has been specified into which the datafiles shall be saved, then saving will automatically be done.

Parameters

filename (str) – A filepath to an input datafile.
**kwargs – Any additional keyword argument that will be passed to any of the wrapped methods.

Returns

data – A tuple of the found assays-of-interst (first element) and normaliser-assays (second element).

Return type

tuple

read(filename: str, **kwargs)[source]

Reads a multi-assay datafile with decorated assays. Any non-decorated assays are ignored!

Parameters

filename (str) – A filepath to a raw data file, containing multiple assays that were decorated. Check out the documentation of the qpcr.Parsers’s to learn more about decorators.
**kwargs – Any additional keyword arguments that should be passed to the qpcr.Parsers’’s read method that extracts the datasets.

save_to(location: Optional[str] = None)[source]

Sets the location into which the individual assay datafiles should be saved.

Parameters: location (str) – The path to a directory where the newly generated assay datafiles shall be saved. If this directory does not yet exist, it will be automatically made.

class qpcr.Readers.Readers.MultiSheetReader[source]

Bases: qpcr.Readers.Readers.MultiReader

Reads a single multi-assay datafile and reads assays-of-interest and normaliser-assays based on decorators.

Input Data Files

Valid input files are multi-assay irregular excel files, that specify assays by one replicate identifier column and one Ct value column.

Separate assay tables may be either below one another (separated by blank lines!) or besides one another (requires transpose = True), but may be in DIFFERENT sheets. All assays from all sheets will be read!

Assays of interest and normaliser assays must be marked using decorators.

Parameters

filename (str) – A filepath to a raw data file, containing multiple assays that were decorated. Check out the documentation of the qpcr.Parsers’s to learn more about decorators.
**kwargs –

parse(*args, **kwargs)[source]: The MultiSheetReader only offers a pipe method! Hence, neither read nor parse will work directly!

pipe(filename: str, **kwargs)[source]

Reads a multi-assay and multi-sheet datafile with decorated assays. Any non-decorated assays are ignored!

Parameters

filename (str) – A filepath to a raw data file, containing multiple assays that were decorated. Check out the documentation of the qpcr.Parsers’s to learn more about decorators.
**kwargs – Any additional keyword arguments that should be passed to the qpcr.Parsers’’s read method that extracts the datasets.

Returns

assays (dict or list) – Returns either the raw dictionary of dataframes returned by the Parser (if no qpcr.Assays could be made automatically) or a list of qpcr.Assay objects.
normalisers (dict or list) – Returns either the raw dictionary of dataframes returned by the Parser (if no qpcr.Assays could be made automatically) or a list of qpcr.Assay objects.

read(*args, **kwargs)[source]: The MultiSheetReader only offers a pipe method! Hence, neither read nor parse will work directly!

class qpcr.Readers.Readers.SingleReader(filename: Optional[str] = None, **kwargs)[source]

Bases: qpcr.Readers.Readers._CORE_Reader

Reads qpcr raw data files in csv or excel format to get a single dataset.

Input Data Files

Valid input files are either regular csv or excel files, or irregular csv or excel files, that specify assays by one replicate identifier column and one Ct value column.

Irregular input files may specify multiple assays as separate tables, one assay has to be selected using the assay argument. Separate assay tables may be either below one another (separated by blank lines!) or besides one another (requires transpose = True).

Note

If the provided file cannot be read as a “regular” file the Reader will automatically switch to parsing. However, if your file is a regular input file, you can force regular reading by passing the argument is_regular = True to the read method, which will prevent parsing and allow you to figure out why regular reading may have failed instead (the Reader will not provide further insight into why regular reading failed if it switches to parsing).

Parameters

filename (str) – A filepath to a raw data file. If the file is a csv file, it has to have two named columns; one for replicate names, one for Ct values. Both csv (, spearated) and csv2 (; separated) are accepted. If the file is an excel file it the relevant sections of the spreadsheet are identified automatically. But they require identifying headers. By default it is assumed that replicate identifiers and Ct values are stored in columns named Name and Ct but these can be changed using the id_label and ct_label arguments that can be passed as kwargs. Also the assay’s id can be set as a kwarg.
**kwargs – Any additional keyword arguments that shall be passed to the read() method which is immediately called during init.

pipe(filename: str, **kwargs)[source]

A wrapper for read+parse+make_Assay

Returns: assay – An qpcr.Assay object of the extracted data
Return type: qpcr.Assay

read(filename: str, **kwargs)[source]

Reads the given data file.

Note

If the data file is an Excel file replicates and their Ct values will be extracted from the first excel sheet of the file by default. A separate sheet can be specified using sheet_name.

Parameters: filename (str) – A filepath to a raw data file. If the file is a csv file, it has to have two named columns; one for replicate names, one for Ct values. Both csv (, spearated) and csv2 (; separated) are accepted. If the file is an excel file it the relevant sections of the spreadsheet are identified automatically. But they require identifying headers. By default it is assumed that replicate identifiers and Ct values are stored in columns named Name and Ct but these can be changed using the id_label and ct_label arguments that can be passed as kwargs. Note, if only two columns are present anyway, they are assumed to be Id (1st) and Ct (2nd) column, and inputs for id_label and ct_label are being ignored! The assay’s id can be set as a kwarg. By default the filename is adopted as id.