The qpcr.Readers
This module provides different Reader
classes that allow reading simple and complex datafiles
of various architectures.
Learn more about the available Readers. If you are interested to learn more about preprocessing the datafiles, check out the Decorator tutorial.
SingleReader
The SingleReader
is able to read both regular and irregular single-assay datafiles.
It can also read multi-assay datafiles but requires an assay
argument, specifying
which assay specifically to extract from it.
id |
Ct |
other_data |
---|---|---|
ctrl1 |
5.67 |
… |
ctrl2 |
5.79 |
… |
ctrl3 |
5.86 |
… |
condA1 |
5.34 |
… |
… |
… |
… |
from qpcr.Readers import SingleReader
reader = SingleReader()
myfile = "my_datafile.csv"
assay = reader.pipe( myfile )
MultiReader
The MultiReader
can read irregular multi-assay datafiles and extract all assays from them
either using a specific assay_pattern to find them or using decorators
(check out the documentation
of the qpcr.Parsers
for more information).
Some meta-data here |
maybe today’s date |
|||
---|---|---|---|---|
Assay 1 |
||||
id |
Ct |
other_data |
||
ctrl1 |
5.67 |
… |
||
ctrl2 |
5.79 |
… |
||
… |
… |
|||
<- blank line here! |
||||
Assay 2 |
||||
id |
Ct |
other_data |
||
ctrl1 |
10.23 |
… |
||
ctrl2 |
10.54 |
… |
||
… |
… |
from qpcr.Readers import MultiReader
myfile = "my_datafile.xlsx"
reader = MultiReader()
# if we have a non-decorated file we can just use
assays = reader.pipe( myfile )
# if we have decorated our file, then we can use
assays, normalisers = reader.pipe( myfile, decorator = True )
MultiSheetReader
The MultiSheetReader
is able to read irregular multi-assay datafiles that contain assays in multiple
datasheets.
By default the MultiSheetReader will read all sheets in an excel file. However, you can specify a single sheet that should be read
exclusively (thus turning the reader to a “MultiReader”) using the sheet
argument a specific sheet can be specified for reading.
BigTableReader
The BigTableReader
is able to read datafiles that store their assays in one single “big table”. It
can extract all assays from that big table using either simple extraction methods or decorators
depending
on the type of big table (check out the documentation of the BigTableReader
for more information on the
types of “big tables” and how to read them).
“Vertical” BigTables
assay
id
Ct
other_data
assay 1
ctrl1
5.67
…
assay 1
ctrl2
5.79
…
…
…
…
…
assay 2
ctrl1
10.23
…
…
…
…
…
from qpcr.Readers import BigTableReader myfile = "my_datafile.xlsx" reader = BigTableReader() assays, normalisers = reader.pipe( filename = file, decorator = True, # specify the kind of big table kind = "vertical", # specify which columns store # the relevant data assay_col = "Assay", id_col = "Name", ct_col = "Ct" )
“Horizontal” BigTables
assay
ctrl1
ctrl2
…
other_data
assay 1
5.67
5.79
…
…
assay 2
10.23
10.54
…
…
…
…
…
…
…
from qpcr.Readers import BigTableReader myfile = "my_datafile.xlsx" reader = BigTableReader() assays, normalisers = reader.pipe( filename = file, # we specify that it's a horizontal table kind = "horizontal", # we **have to** specify the # number of replicates per group replicates = 3, # we also specify the group names (because they # will not be inferrable due to the different # column names of the replicates) names = ["control", "condition A", "condition B", "condition C"], # we must also specify which column specifies the assays # NOTE: in this mode this is handled by the `id_col` argument! id_col = "Name" )
“Hybrid” BigTables
id
assay 1
assay 2
other_data
ctrl
7.65
11.78
…
ctrl
7.87
11.56
…
ctrl
7.89
11.76
…
condA
7.56
11.98
…
condA
7.34
11.56
…
…
…
…
…
from qpcr.Readers import BigTableReader myfile = "my_datafile.xlsx" reader = BigTableReader() assays, normalisers = reader.pipe( filename = file, # we specify that it's a horizontal table kind = "hybrid", # we **have to** specify the # number of replicates per group replicates = 3, # we also specify the group names (because they # will not be inferrable due to the different # column names of the replicates) names = ["control", "condition A", "condition B", "condition C"], id_col = "Sample", # in this setup each assay is already stored as a separate datacolumn # so we must provide either a list of the column names or use decorators ct_col = ["ActB", "HNRNPL", "SRSF11"], ) # NOTE: Because we did not specify any decorators here, the normalisers list will be created but be empty!
- class qpcr.Readers.Readers.BigTableReader[source]
Bases:
qpcr.Readers.Readers.MultiReader
Reads a single multi-assay datafile and reads assays-of-interest and normaliser-assays based on decorators.
Input Data Files
Valid input files are multi-assay irregular
csv
orexcel
files, that specify assays as one big table containing all information together. Note that this implies that the entire data is stored in a single sheet (if usingexcel
files).Three possible data architectures are allowed:
Vertical
Big TablesBig Tables of this kind require three columns (any additional columns are disregarded): one specifying the assay, one specifying the replicate identifiers, and one specifying the Ct values. An additional fourth column (@qpcr) may be filled with decorators but this is not necessary in this setup.
Horizontal
Big TablesBig Tables of this kind store replicates from assays in side-by-side columns. The replicates may be labelled numerically or all have the same column header. A second column is required specifying the replicate identifier.
Note, this kind of setup requires decorators above the first replicate of each assay, as well as user-defined
replicates
!Note, the column headers have to be unique to the table! Also, a word of warning with regard to replicate assays. The entries in the
assay
defining column must be unique! If you have multiple assays from the same gene which therefore also have the same id they will be interpreted as belonging together and will be assembled into the sameqpcr.Assay
object. However, this will result in differently sized Assays which will cause problems downstream when you (or aqpcr.Normaliser
) try to assemble a qpcr.Results object!Hybrid
Big TablesBig Tables of this kind store Ct values of different assays in separate side-by-side columns, but they store the replicate identifiers as a separate column. Hence, they combine aspects of vertical and horizontal Big Tables.
- Note, two options exist to read this kind of setup.
A
list
ofct_col
values can be passed which contains the column header of each assay.The table can be decorated, in which case only decorated assays (columns) are extracted.
Please, note that the two methods of reading this table are mutually exclusive! So, if you decorate your table you cannot pass specific assay headers to the
ct_col
argument anymore.
- pipe(filename: str, kind: str, id_col: str, **kwargs)[source]
A wrapper for read+parse+make_Assays
Note
This is the suggested use of the
BigTableReader
.- Parameters
filename (str) – A filepath to a raw data file, containing multiple assays that were decorated. Check out the documentation of the
qpcr.Parsers
’s to learn more about decorators.kind (str) – Specifies the kind of Big Table from the file. This may either be
"horizontal"
,"vertical"
, or"hybrid"
.id_col (str) – The column header specifying the replicate identifiers (or “assays” in case of
horizontal
big tables).**kwargs – Any additional columns or keyword arguments.
- Returns
assays (dict or list) – Returns either the raw dictionary of dataframes returned by the Parser (if no qpcr.Assays could be made automatically) or a list of
qpcr.Assay
objects.normalisers (dict or list) – Returns either the raw dictionary of dataframes returned by the Parser (if no qpcr.Assays could be made automatically) or a list of
qpcr.Assay
objects.
- read(filename: str, kind: str, id_col: str, **kwargs)[source]
Reads a regular or irregular
csv
orexcel
datafile that contains data stored in a single big table. Files are first tried to be read regularly, if this fails, the Reader resorts to parsing to identify the relevant sections of the data.- Parameters
filename (str) – A filepath to a raw data file, containing multiple assays that were decorated. Check out the documentation of the
qpcr.Parsers
’s to learn more about decorators.kind (str) – Specifies the kind of Big Table from the file. This may either be
"horizontal"
,"vertical"
, or"hybrid"
.id_col (str) – The column header specifying the replicate identifiers (or “assays” in case of
horizontal
big tables).**kwargs – Any additional columns or keyword arguments.
- class qpcr.Readers.Readers.MultiReader(filename: Optional[str] = None, **kwargs)[source]
Bases:
qpcr.Readers.Readers.SingleReader
,qpcr._auxiliary._ID
Reads a single multi-assay datafile and reads assays-of-interest and normaliser-assays based on decorators.
Input Data Files
Valid input files are multi-assay irregular
csv
orexcel
files, that specify assays by one replicate identifier column and one Ct value column.Separate assay tables may be either below one another (separated by blank lines!) or besides one another (requires transpose = True), but ALL in the SAME sheet!
Assays of interest and normaliser assays must be marked using
decorators
.Note
MultiReader
can transform the extracted datasets directly intoqpcr.Assay
objects usingMultiReader.make_Assays()
. It will perform grouping of assays if possible but will return raw-assays if not!get
will either return a dictionary of the raw dataframes or a list of ``qpcr.Assay``s.- Parameters
filename (str) – A filepath to a raw data file, containing multiple assays that were decorated. Check out the documentation of the
qpcr.Parsers
’s to learn more about decorators.**kwargs – Any additional keyword arguments that should be passed to the
read
method which is immediately called during init if a filename is provided.
- assays(which: Optional[str] = None)[source]
- Parameters
which (str) – If specified it only returns the data for the specified assay. Otherwise (default) it returns all assays.
- Returns
data (dict or list) – Returns either the raw dictionary of dataframes returned by the Parser (if
make_Assays
has not been run yet) or a list ofqpcr.Assay
objects.names (list) – A list of the names of all extracted assays.
- get(which: str)[source]
Returns the stored assays or normalisers.
- Parameters
which (str) – Can be either
"assays"
or"normalisers"
or any specific assay identifier.- Returns
data – Returns either the raw dictionary of dataframes returned by the Parser (if
make_Assays
has not been run yet) or a list ofqpcr.Assay
objects.- Return type
dict or list
- normalisers(which: Optional[str] = None)[source]
- Parameters
which (str) – If specified it only returns the data for the specified normaliser. Otherwise (default) it returns all normalisers.
- Returns
data (dict or list) – Returns either the raw dictionary of dataframes returned by the Parser (if
make_Assays
has not been run yet) or a list ofqpcr.Assay
objects.names (list) – A list of the names of all extracted normalisers.
- parse(**kwargs)[source]
Extracts the datasets (assays) from the read datafile.
- Parameters
**kwargs – Any additional keyword arguments that should be passed to the
qpcr.Parsers
’’sparse
method that extracts the datasets.
- pipe(filename: str, **kwargs)[source]
A wrapper for read+parse+make_Assays
Note
This is the suggested use of
MultiReader
. If a directory has been specified into which the datafiles shall be saved, then saving will automatically be done.- Parameters
filename (str) – A filepath to an input datafile.
**kwargs – Any additional keyword argument that will be passed to any of the wrapped methods.
- Returns
data – A tuple of the found assays-of-interst (first element) and normaliser-assays (second element).
- Return type
tuple
- read(filename: str, **kwargs)[source]
Reads a multi-assay datafile with decorated assays. Any non-decorated assays are ignored!
- Parameters
filename (str) – A filepath to a raw data file, containing multiple assays that were decorated. Check out the documentation of the
qpcr.Parsers
’s to learn more about decorators.**kwargs – Any additional keyword arguments that should be passed to the
qpcr.Parsers
’’sread
method that extracts the datasets.
- save_to(location: Optional[str] = None)[source]
Sets the location into which the individual assay datafiles should be saved.
- Parameters
location (str) – The path to a directory where the newly generated assay datafiles shall be saved. If this directory does not yet exist, it will be automatically made.
- class qpcr.Readers.Readers.MultiSheetReader[source]
Bases:
qpcr.Readers.Readers.MultiReader
Reads a single multi-assay datafile and reads assays-of-interest and normaliser-assays based on decorators.
Input Data Files
Valid input files are multi-assay irregular
excel
files, that specify assays by one replicate identifier column and one Ct value column.Separate assay tables may be either below one another (separated by blank lines!) or besides one another (requires
transpose = True
), but may be in DIFFERENT sheets. All assays from all sheets will be read!Assays of interest and normaliser assays must be marked using
decorators
.- Parameters
filename (str) – A filepath to a raw data file, containing multiple assays that were decorated. Check out the documentation of the
qpcr.Parsers
’s to learn more about decorators.**kwargs –
- parse(*args, **kwargs)[source]
The
MultiSheetReader
only offers apipe
method! Hence, neitherread
norparse
will work directly!
- pipe(filename: str, **kwargs)[source]
Reads a multi-assay and multi-sheet datafile with decorated assays. Any non-decorated assays are ignored!
- Parameters
filename (str) – A filepath to a raw data file, containing multiple assays that were decorated. Check out the documentation of the
qpcr.Parsers
’s to learn more about decorators.**kwargs – Any additional keyword arguments that should be passed to the
qpcr.Parsers
’’sread
method that extracts the datasets.
- Returns
assays (dict or list) – Returns either the raw dictionary of dataframes returned by the Parser (if no qpcr.Assays could be made automatically) or a list of
qpcr.Assay
objects.normalisers (dict or list) – Returns either the raw dictionary of dataframes returned by the Parser (if no qpcr.Assays could be made automatically) or a list of
qpcr.Assay
objects.
- class qpcr.Readers.Readers.SingleReader(filename: Optional[str] = None, **kwargs)[source]
Bases:
qpcr.Readers.Readers._CORE_Reader
Reads qpcr raw data files in csv or excel format to get a single dataset.
Input Data Files
Valid input files are either regular
csv
orexcel
files, or irregularcsv
orexcel
files, that specify assays by one replicate identifier column and one Ct value column.Irregular input files may specify multiple assays as separate tables, one assay has to be selected using the
assay
argument. Separate assay tables may be either below one another (separated by blank lines!) or besides one another (requires transpose = True).Note
If the provided file cannot be read as a “regular” file the Reader will automatically switch to parsing. However, if your file is a regular input file, you can force regular reading by passing the argument
is_regular = True
to theread
method, which will prevent parsing and allow you to figure out why regular reading may have failed instead (the Reader will not provide further insight into why regular reading failed if it switches to parsing).- Parameters
filename (str) – A filepath to a raw data file. If the file is a
csv
file, it has to have two named columns; one for replicate names, one for Ct values. Both csv (,
spearated) and csv2 (;
separated) are accepted. If the file is anexcel
file it the relevant sections of the spreadsheet are identified automatically. But they require identifying headers. By default it is assumed that replicate identifiers and Ct values are stored in columns namedName
andCt
but these can be changed using the id_label and ct_label arguments that can be passed as kwargs. Also the assay’sid
can be set as a kwarg.**kwargs – Any additional keyword arguments that shall be passed to the read() method which is immediately called during init.
- pipe(filename: str, **kwargs)[source]
A wrapper for read+parse+make_Assay
- Returns
assay – An
qpcr.Assay
object of the extracted data- Return type
qpcr.Assay
- read(filename: str, **kwargs)[source]
Reads the given data file.
Note
If the data file is an Excel file replicates and their Ct values will be extracted from the first excel sheet of the file by default. A separate sheet can be specified using sheet_name.
- Parameters
filename (str) – A filepath to a raw data file. If the file is a
csv
file, it has to have two named columns; one for replicate names, one for Ct values. Both csv (,
spearated) and csv2 (;
separated) are accepted. If the file is anexcel
file it the relevant sections of the spreadsheet are identified automatically. But they require identifying headers. By default it is assumed that replicate identifiers and Ct values are stored in columns namedName
andCt
but these can be changed using theid_label
andct_label
arguments that can be passed as kwargs. Note, if only two columns are present anyway, they are assumed to be Id (1st) and Ct (2nd) column, and inputs forid_label
andct_label
are being ignored! The assay’sid
can be set as a kwarg. By default the filename is adopted as id.