"""
.. _qpcr.Readers:
This module provides different ``Reader`` classes that allow reading simple and complex datafiles
of various architectures.
Learn more about the available Readers. If you are interested to learn more about preprocessing the datafiles,
check out the `Decorator tutorial <https://qpcr.readthedocs.io/en/latest/tutorials/8_decorating_datafiles.html>`_.
SingleReader
------------
The ``SingleReader`` is able to read both regular and irregular single-assay datafiles.
It can also read multi-assay datafiles but requires an ``assay`` argument, specifying
which assay specifically to extract from it.
+--------+-------+-------------+
| id | Ct | other_data |
+========+=======+=============+
| ctrl1 | 5.67 | ... |
+--------+-------+-------------+
| ctrl2 | 5.79 | ... |
+--------+-------+-------------+
| ctrl3 | 5.86 | ... |
+--------+-------+-------------+
| condA1 | 5.34 | ... |
+--------+-------+-------------+
| ... | ... | ... |
+--------+-------+-------------+
.. code-block:: python
from qpcr.Readers import SingleReader
reader = SingleReader()
myfile = "my_datafile.csv"
assay = reader.pipe( myfile )
MultiReader
-----------
The ``MultiReader`` can read irregular multi-assay datafiles and extract all assays from them
either using a specific `assay_pattern` to find them or using ``decorators`` (check out the documentation
of the ``qpcr.Parsers`` for more information).
+----------------------+---------------------+-------------+----------------------+---+
| Some meta-data here | maybe today's date | | | |
+======================+=====================+=============+======================+===+
| Assay 1 | | | | |
+----------------------+---------------------+-------------+----------------------+---+
| id | Ct | other_data | | |
+----------------------+---------------------+-------------+----------------------+---+
| ctrl1 | 5.67 | ... | | |
+----------------------+---------------------+-------------+----------------------+---+
| ctrl2 | 5.79 | ... | | |
+----------------------+---------------------+-------------+----------------------+---+
| ... | ... | | | |
+----------------------+---------------------+-------------+----------------------+---+
| | | | <- blank line here! | |
+----------------------+---------------------+-------------+----------------------+---+
| Assay 2 | | | | |
+----------------------+---------------------+-------------+----------------------+---+
| id | Ct | other_data | | |
+----------------------+---------------------+-------------+----------------------+---+
| ctrl1 | 10.23 | ... | | |
+----------------------+---------------------+-------------+----------------------+---+
| ctrl2 | 10.54 | ... | | |
+----------------------+---------------------+-------------+----------------------+---+
| ... | ... | | | |
+----------------------+---------------------+-------------+----------------------+---+
.. code-block::
from qpcr.Readers import MultiReader
myfile = "my_datafile.xlsx"
reader = MultiReader()
# if we have a non-decorated file we can just use
assays = reader.pipe( myfile )
# if we have decorated our file, then we can use
assays, normalisers = reader.pipe( myfile, decorator = True )
MultiSheetReader
----------------
The ``MultiSheetReader`` is able to read irregular multi-assay datafiles that contain assays in multiple
datasheets.
By default the MultiSheetReader will read *all* sheets in an excel file. However, you can specify *a single sheet* that should be read
exclusively (thus turning the reader to a "MultiReader") using the ``sheet`` argument a specific sheet can be specified for reading.
BigTableReader
--------------
The ``BigTableReader`` is able to read datafiles that store their assays in one single "big table". It
can extract all assays from that big table using either simple extraction methods or ``decorators`` depending
on the type of big table (check out the documentation of the ``BigTableReader`` for more information on the
types of "big tables" and how to read them).
"Vertical" BigTables
+----------+--------+-------+-------------+
| assay | id | Ct | other_data |
+==========+========+=======+=============+
| assay 1 | ctrl1 | 5.67 | ... |
+----------+--------+-------+-------------+
| assay 1 | ctrl2 | 5.79 | ... |
+----------+--------+-------+-------------+
| ... | ... | ... | ... |
+----------+--------+-------+-------------+
| assay 2 | ctrl1 | 10.23 | ... |
+----------+--------+-------+-------------+
| ... | ... | ... | ... |
+----------+--------+-------+-------------+
.. code-block:: python
from qpcr.Readers import BigTableReader
myfile = "my_datafile.xlsx"
reader = BigTableReader()
assays, normalisers = reader.pipe(
filename = file,
decorator = True,
# specify the kind of big table
kind = "vertical",
# specify which columns store
# the relevant data
assay_col = "Assay",
id_col = "Name",
ct_col = "Ct"
)
"Horizontal" BigTables
+----------+--------+--------+------+-------------+
| assay | ctrl1 | ctrl2 | ... | other_data |
+==========+========+========+======+=============+
| assay 1 | 5.67 | 5.79 | ... | ... |
+----------+--------+--------+------+-------------+
| assay 2 | 10.23 | 10.54 | ... | ... |
+----------+--------+--------+------+-------------+
| ... | ... | ... | ... | ... |
+----------+--------+--------+------+-------------+
.. code-block:: python
from qpcr.Readers import BigTableReader
myfile = "my_datafile.xlsx"
reader = BigTableReader()
assays, normalisers = reader.pipe(
filename = file,
# we specify that it's a horizontal table
kind = "horizontal",
# we **have to** specify the
# number of replicates per group
replicates = 3,
# we also specify the group names (because they
# will not be inferrable due to the different
# column names of the replicates)
names = ["control", "condition A", "condition B", "condition C"],
# we must also specify which column specifies the assays
# NOTE: in this mode this is handled by the `id_col` argument!
id_col = "Name"
)
"Hybrid" BigTables
+-------+----------+----------+-------------+
| id | assay 1 | assay 2 | other_data |
+=======+==========+==========+=============+
| ctrl | 7.65 | 11.78 | ... |
+-------+----------+----------+-------------+
| ctrl | 7.87 | 11.56 | ... |
+-------+----------+----------+-------------+
| ctrl | 7.89 | 11.76 | ... |
+-------+----------+----------+-------------+
| condA | 7.56 | 11.98 | ... |
+-------+----------+----------+-------------+
| condA | 7.34 | 11.56 | ... |
+-------+----------+----------+-------------+
| ... | ... | ... | ... |
+-------+----------+----------+-------------+
.. code-block:: python
from qpcr.Readers import BigTableReader
myfile = "my_datafile.xlsx"
reader = BigTableReader()
assays, normalisers = reader.pipe(
filename = file,
# we specify that it's a horizontal table
kind = "hybrid",
# we **have to** specify the
# number of replicates per group
replicates = 3,
# we also specify the group names (because they
# will not be inferrable due to the different
# column names of the replicates)
names = ["control", "condition A", "condition B", "condition C"],
id_col = "Sample",
# in this setup each assay is already stored as a separate datacolumn
# so we must provide either a list of the column names or use decorators
ct_col = ["ActB", "HNRNPL", "SRSF11"],
)
# NOTE: Because we did not specify any decorators here, the normalisers list will be created but be empty!
"""
# Concept to link the Readers to the DataReader
# All Readers define a _DataReader method that
# specifies which of its methods is supposed
# to be used for (mostly pipe, sometimes read...)
# _DataReader methods *must* return the data they read!
import logging
import pandas as pd
import qpcr
import qpcr.defaults as defaults
import qpcr._auxiliary as aux
import qpcr._auxiliary.warnings as aw
import qpcr.Parsers as Parsers
import os
import numpy as np
from copy import deepcopy
import re
logger = aux.default_logger()
__pdoc__ = {"_CORE_Reader": True}
# migrate default settings from __init__
raw_col_names = defaults.raw_col_names
supported_filetypes = defaults.supported_filetypes
default_dataset_header = defaults.dataset_header
default_id_header = defaults.id_header
default_ct_header = defaults.ct_header
class _CORE_Reader(aux._ID):
"""
The class handling the core functions of the Reader class.
Both the standard qpcr.Reader as well as the qpcr._Qupid_Reader
inherit from this.
"""
__slots__ = "_src", "_delimiter", "_header", "_df", "_replicates", "_names"
def __init__(self):
super().__init__()
self._src = None
self._delimiter = None
self._header = 0
self._df = None
self._replicates = None
self._names = None
def get(self):
"""
Returns
-------
data : pd.DataFrame
The dataframe from the datafile.
"""
return self._df
def n(self):
"""
Returns
-------
n : int
The number of replicates (entries) in the dataframe.
"""
return len(self._df[raw_col_names[0]])
def make_Assay(self):
"""
Converts the extracted dataset into an ``qpcr.Assay``.
Returns
--------
Assay : qpcr.Assay
The ``qpcr.Assay`` from the extracted dataset.
"""
logger.debug(f"{self._df=}")
assay = self._make_new_Assay(self.id(), self._df)
return assay
def read(self, **kwargs):
"""
Reads the given data file.
If the data file is an Excel file replicates and their Ct values will be
extracted from the first excel sheet of the file. Note, this assumes by default
that the replicates are headed by the label `"Name"` and the corresponding Ct values
are headed by the label `"Ct"`. Both labels have to be on the same row.
If these labels do not match your excel file, you may
specify `id_label` and `ct_label` as additional arguments.
"""
suffix = self._filesuffix()
# check for a valid input file
if suffix not in supported_filetypes:
e = aw.MultiReaderError("empty_data", file=self._src)
logger.critical(e)
raise e
if suffix == "csv":
# first try simple read of "regular files"
try:
logging.info("trying to read the file regularly...")
self._csv_read(**kwargs)
except Exception as e:
# users can force-regular reading mode
is_regular = aux.from_kwargs("is_regular", False, kwargs, rm=True)
if is_regular:
# print out warning
logger.error(e)
return
logger.info("unable to regularly read file. Resort to parsing...")
# setup parser
parser = Parsers.CsvParser()
self._prep_Parser(kwargs, parser)
assay_of_interest = aux.from_kwargs("assay", None, kwargs, rm=True)
# pipe the datafile through the parser
parser.pipe(self._src, **kwargs)
# get the data
self._get_single_assay(parser, assay_of_interest)
elif suffix == "xlsx":
try:
logging.info("trying to read the file regularly...")
self._excel_read(**kwargs)
except Exception as e:
# users can force-regular reading mode
is_regular = aux.from_kwargs("is_regular", False, kwargs, rm=True)
if is_regular:
# print out warning
logger.error(e)
return
logger.info("unable to regularly read file. Resort to parsing...")
# setup parser
parser = Parsers.ExcelParser()
self._prep_Parser(kwargs, parser)
# check for sheet_name
sheet_name = aux.from_kwargs("sheet_name", 0, kwargs, rm=True)
# store assay-of-interest
assay_of_interest = aux.from_kwargs("assay", None, kwargs, rm=True)
# pipe the datafile through the parser
parser.read(self._src, sheet_name=sheet_name)
parser.parse(**kwargs)
# get the data
self._get_single_assay(parser, assay_of_interest)
def names(self, names: (list or dict)):
"""
Set names for replicates groups.
Parameters
----------
names : list or dict
Either a ``list`` (new names without repetitions) or ``dict`` (key = old name, value = new name) specifying new group names.
Group names only need to be specified once, and are applied to all replicate entries.
"""
if names is not None:
self._names = names
return self._names
def replicates(self, replicates: (int or tuple or str) = None):
"""
Either sets or gets the replicates settings to be used for grouping
Before they are assigned, replicates are vetted to ensure they cover all data entries.
Parameters
----------
replicates : int or tuple or str
Can be an ``integer`` (equal group sizes, e.g. ``3`` for triplicates),
or a ``tuple`` (uneven group sizes, e.g. ``(3,2,3)`` if the second group is only a duplicate).
Another method to achieve the same thing is to specify a "formula" as a string of how to create a replicate tuple.
The allowed structure of such a formula is ``n:m`` where ``n`` is the number of replicates in a group and ``m`` is the number of times
this pattern is repeated (if no `:m` is specified ``:1`` is assumed). So, as an example, if there are 12 groups which are triplicates, but
at the end there is one which only has a single replicate (like the commonly measured diluent qPCR sample), we could either specify the tuple
individually as ``replicates = (3,3,3,3,3,3,3,3,3,3,3,3,1)`` or we use the formula to specify ``replicates = "3:12,1"``. Of course, this works for
any arbitrary setting such as ``"3:5,2:5,10,3:12"`` (which specifies five triplicates, followed by two duplicates, a single decaplicate, and twelve triplicates again – truly a dataset from another dimension)...
"""
if replicates is not None:
self._replicates = replicates
return self._replicates
def _get_single_assay(self, parser, assay_of_interest):
"""
Gets a single dataset from the Parser
"""
# check if there are multiple datasets
# and if so, check if we got a specified assay_of_interest
if len(parser.assays()) > 1:
if assay_of_interest is None:
e = aw.ReaderError("cannot_read_multifile", file=self._src, assays=parser.assays())
logger.critical(e)
SystemExit(e)
self._df = parser.get(assay_of_interest)
self.id_reset()
self.id(assay_of_interest)
# if only one assay is present anyway, get that one
else:
assay_of_interest = parser.assays()[0]
self._df = parser.get(assay_of_interest)
self.id_reset()
self.id(assay_of_interest)
def _prep_Parser(self, kwargs, parser):
transpose = aux.from_kwargs("transpose", False, kwargs, rm=True)
if transpose:
parser.transpose()
# setup patterns and store assay-of-interest
assay_pattern = aux.from_kwargs("assay_pattern", "Rotor-Gene", kwargs)
parser.assay_pattern(assay_pattern)
# get data column labels
id_label = aux.from_kwargs("id_label", "Name", kwargs, rm=True)
ct_label = aux.from_kwargs("ct_label", "Ct", kwargs, rm=True)
parser.labels(id_label, ct_label)
def _csv_read(self, **kwargs):
"""
Reads the given data file if it's a csv file
This is the basic default reading method for
regular csv files.
"""
df = None
# header = aux.from_kwargs("header", 0, kwargs, rm = True)
try:
df = pd.read_csv(
self._src,
sep=self._delimiter,
header=self._header,
# names = raw_col_names
)
except Exception as e:
logger.exception(e)
e = aw.ReaderError("cannot_read_csv", file=self._src)
logger.critical(e)
SystemExit(e)
# try to get a FileID, from the kwargs
# if that fails, try to get the one from fileID
id = aux.from_kwargs("id", None, kwargs, rm=True)
if id is not None:
self.id_reset()
self.id(id)
elif isinstance(self._src, str):
self.id_reset()
self.id(aux.fileID(self._src))
# vet and crop the dataframe where necessary
df = self._vet_single_assay_df(kwargs, df)
self._df = df
def _excel_read(self, **kwargs):
"""
Reads the given data file if it's an excel file
This is the basic default reading method for
regular excel files.
"""
df = None
sheet_name = aux.from_kwargs("sheet_name", 0, kwargs, rm=True)
# header = aux.from_kwargs("header", 0, kwargs, rm = True)
try:
df = pd.read_excel(
self._src,
sheet_name=sheet_name,
header=self._header,
# names = raw_col_names
)
except Exception as e:
logger.exception(e)
e = aw.ReaderError("cannot_read_csv", file=self._src)
logger.critical(e)
SystemExit(e)
# try to get a FileID, from the kwargs
# if that fails, try to get the one from fileID
id = aux.from_kwargs("id", None, kwargs, rm=True)
if id is not None:
self.id_reset()
self.id(id)
elif isinstance(self._src, str):
self.id_reset()
self.id(aux.fileID(self._src))
# vet and crop the dataframe where necessary
df = self._vet_single_assay_df(kwargs, df)
self._df = df
def _vet_single_assay_df(self, kwargs, df):
"""
Vets that both Id and Ct columns are present in the data
and if so crops the df to the relevant columns, or checks if
only two columns are present anyway and then assumes Id+Ct as these two.
"""
# check if we got exactly two columns only
logger.debug(f"df at the start\n{df}")
if len(df.columns) == 2:
# just get the current column names for later renaming
Id, Ct = df.columns
else:
# check if a valid Ct column was found
Ct = aux.from_kwargs("ct_label", default_ct_header, kwargs)
Id = aux.from_kwargs("id_label", default_id_header, kwargs)
logger.debug(f"{df.columns=}")
valid_data = Ct in df.columns and Id in df.columns
if not valid_data:
e = aw.ReaderError("cannot_find_datacols", id_label=Id, ct_label=Ct)
logger.info(e) # the way this function is wrapped, this is worth only an info...
raise e
else:
# get only the relevant data columns
df = df[[Id, Ct]]
# make sure to convert Ct values to float
tmp_parser = Parsers.CsvParser()
Ct_col = df[Ct].to_numpy()
df[Ct] = tmp_parser._convert_to_numeric(self.id(), Ct_col)
# rename to qpcr default headers (id + Ct)
df = df.rename(columns={Id: raw_col_names[0], Ct: raw_col_names[1]})
logger.debug(f"df at the end (after rename)\n{df}")
return df
def _filesuffix(self):
"""
Returns the file-suffix of the provided file
"""
try:
suffix = self._src.split(".")[-1]
return suffix
except Exception as e:
logger.debug(e)
logger.info("Could not determine file suffix...")
def _make_new_Assay(self, name, df):
"""
Makes a new Assay object and performs group() already...
"""
new_assay = qpcr.Assay(df=df, id=name, replicates=self._replicates, group_names=self._names)
return new_assay
[docs]class SingleReader(_CORE_Reader):
"""
Reads qpcr raw data files in csv or excel format to get a single dataset.
Input Data Files
Valid input files are either regular ``csv`` or ``excel`` files, or irregular ``csv`` or ``excel`` files,
that specify assays by one replicate identifier column and one Ct value column.
Irregular input files may specify multiple assays as separate tables,
one assay has to be selected using the ``assay`` argument.
Separate assay tables may be either below one another (separated by blank lines!)
or besides one another (requires `transpose = True`).
Note
----
If the provided file cannot be read as a "regular" file the Reader will automatically
switch to parsing. However, if your file *is* a regular input file, you can force regular reading
by passing the argument ``is_regular = True`` to the ``read`` method, which will prevent parsing and allow
you to figure out why regular reading may have failed instead (the Reader will not
provide further insight into why regular reading failed if it switches to parsing).
Parameters
----------
filename : str
A filepath to a raw data file.
If the file is a ``csv`` file, it has to have two named columns; one for replicate names, one for Ct values.
Both csv (``,`` spearated) and csv2 (``;`` separated) are accepted.
If the file is an ``excel`` file it the relevant sections of the spreadsheet are identified automatically.
But they require identifying headers. By default it is assumed that replicate identifiers and Ct values are
stored in columns named ``Name`` and ``Ct`` but these can be changed using
the `id_label` and `ct_label` arguments that can be passed as kwargs.
Also the assay's ``id`` can be set as a kwarg.
**kwargs
Any additional keyword arguments that shall be passed to the `read()` method which is immediately called during init.
"""
def __init__(self, filename: str = None, **kwargs) -> pd.DataFrame:
super().__init__()
self._src = filename
self._delimiter = None
self._header = 0
if self._src is not None:
self.read(self._src, **kwargs)
[docs] def read(self, filename: str, **kwargs):
"""
Reads the given data file.
Note
-----
If the data file is an Excel file replicates and their Ct values will be
extracted from the first excel sheet of the file by default.
A separate sheet can be specified using `sheet_name`.
Parameters
----------
filename : str
A filepath to a raw data file.
If the file is a ``csv`` file, it has to have two named columns; one for replicate names, one for Ct values.
Both csv (``,`` spearated) and csv2 (``;`` separated) are accepted.
If the file is an ``excel`` file it the relevant sections of the spreadsheet are identified automatically.
But they require identifying headers.
By default it is assumed that replicate identifiers and Ct values are
stored in columns named ``Name`` and ``Ct`` but these can be changed using
the ``id_label`` and ``ct_label`` arguments that can be passed as kwargs.
Note, if only two columns are present anyway, they are assumed to be Id (1st) and Ct (2nd) column,
and inputs for ``id_label`` and ``ct_label`` are being ignored!
The assay's ``id`` can be set as a kwarg. By default the filename is adopted as id.
"""
self._src = filename
self._replicates = aux.from_kwargs("replicates", None, kwargs)
self._names = aux.from_kwargs("names", None, kwargs)
self._header = aux.from_kwargs("header", 0, kwargs, rm=True)
if self._filesuffix() == "csv":
self._delimiter = ";" if self._is_csv2() else ","
super().read(**kwargs)
[docs] def pipe(self, filename: str, **kwargs):
"""
A wrapper for read+parse+make_Assay
Returns
-------
assay : qpcr.Assay
An ``qpcr.Assay`` object of the extracted data
"""
self.read(filename=filename, **kwargs)
assay = self.get()
assay = self.make_Assay()
return assay
def __dreader__(self, **kwargs):
"""
The DataReader interacting method
"""
data = self.pipe(**kwargs)
return data
def _is_csv2(self):
"""
Tests if csv file is ; delimited (True) or common , (False)
"""
with open(self._src, "r") as openfile:
content = openfile.read()
if ";" in content:
return True
return False
def _has_header(self):
"""
Checks if column headers are provided in the data file
It does so by checking if the second element in the first row is numeric
if it is numeric (returns None << False) no headers are presumed. Otherwise
it returns 0 (as in first row has headers)...
"""
with open(self._src, "r") as openfile:
content = openfile.read().split("\n")[0]
content = content.split(self._delimiter)
try:
second_col = content[1]
second_col = float(second_col)
except ValueError:
return 0 # Headers in row 0
return None # no headers
# removed qpcr.Assay from inheritance here...
[docs]class MultiReader(SingleReader, aux._ID):
"""
Reads a single multi-assay datafile and reads assays-of-interest and normaliser-assays based on decorators.
Input Data Files
Valid input files are multi-assay irregular ``csv`` or ``excel`` files,
that specify assays by one replicate identifier column and one Ct value column.
Separate assay tables may be either below one another (separated by blank lines!)
or besides one another (requires `transpose = True`), but ALL in the SAME sheet!
Assays of interest and normaliser assays *must* be marked using ``decorators``.
Note
------
``MultiReader`` can transform the extracted datasets directly into ``qpcr.Assay`` objects using ``MultiReader.make_Assays()``.
It will perform grouping of assays if possible but will return raw-assays if not! ``get`` will either return a dictionary
of the raw dataframes or a list of ``qpcr.Assay``s.
Parameters
----------
filename : str
A filepath to a raw data file, containing multiple assays that were decorated.
Check out the documentation of the ``qpcr.Parsers``'s to learn more about decorators.
**kwargs
Any additional keyword arguments that should be passed to the ``read`` method which is immediately called during init if a filename is provided.
"""
__slots__ = ["_Parser", "_assay_pattern", "_assays", "_normalisers"]
def __init__(self, filename: str = None, **kwargs):
super(aux._ID, self).__init__()
self._src = filename
self._save_loc = None
self._replicates = None
self._names = None
self._Parser = None
self._assay_pattern = None
self._assays = {}
self._normalisers = {}
if self._src is not None:
self._Parser = Parsers.CsvParser() if self._filesuffix() == "csv" else Parsers.ExcelParser()
self.read(filename=self._src, **kwargs)
[docs] def clear(self):
"""
Clears all the extracted data from the Reader
"""
self._assays = {}
self._normalisers = {}
[docs] def assays(self, which: str = None):
"""
Parameters
----
which : str
If specified it only returns the data for the specified assay.
Otherwise (default) it returns all assays.
Returns
-------
data : dict or list
Returns either the raw dictionary of dataframes returned by the Parser
(if ``make_Assays`` has not been run yet)
or a list of ``qpcr.Assay`` objects.
names : list
A list of the names of all extracted assays.
"""
return self._get_from_which(self._assays, which)
[docs] def normalisers(self, which: str = None):
"""
Parameters
----
which : str
If specified it only returns the data for the specified normaliser.
Otherwise (default) it returns all normalisers.
Returns
-------
data : dict or list
Returns either the raw dictionary of dataframes returned by the Parser
(if ``make_Assays`` has not been run yet)
or a list of ``qpcr.Assay`` objects.
names : list
A list of the names of all extracted normalisers.
"""
return self._get_from_which(self._normalisers, which)
[docs] def get(self, which: str):
"""
Returns the stored assays or normalisers.
Parameters
----------
which : str
Can be either ``"assays"`` or ``"normalisers"`` or any specific assay identifier.
Returns
-------
data : dict or list
Returns either the raw dictionary of dataframes returned by the Parser
(if ``make_Assays`` has not been run yet)
or a list of ``qpcr.Assay`` objects.
"""
data = None
if which == "assays":
data = self._assays
elif which == "normalisers":
data = self._normalisers
else:
try:
data = self._get_from_which(self._assays, which)
except KeyError:
data = self._get_from_which(self._normalisers, which)
return data
[docs] def read(self, filename: str, **kwargs):
"""
Reads a multi-assay datafile with decorated assays.
Any non-decorated assays are ignored!
Parameters
----------
filename : str
A filepath to a raw data file, containing multiple assays that were decorated.
Check out the documentation of the ``qpcr.Parsers``'s to learn more about decorators.
**kwargs
Any additional keyword arguments that should be passed to the ``qpcr.Parsers``''s ``read`` method that extracts the datasets.
"""
self._src = filename
# check for a valid input file
if self._filesuffix() not in supported_filetypes:
e = aw.MultiReader("empty_data", file=self._src)
logger.critical(e)
SystemExit(e)
self._Parser = Parsers.CsvParser() if self._filesuffix() == "csv" else Parsers.ExcelParser()
# check if file should be read transposed
transpose = aux.from_kwargs("transpose", False, kwargs, rm=True)
if transpose:
self._Parser.transpose()
# setup a saving location if it was provided
if self.save_to() is not None:
self._Parser.save_to(self.save_to())
self._Parser.read(self._src, **kwargs)
[docs] def parse(self, **kwargs):
"""
Extracts the datasets (assays) from the read datafile.
Parameters
----------
**kwargs
Any additional keyword arguments that should be passed to the ``qpcr.Parsers``''s ``parse`` method that extracts the datasets.
"""
# check for the two required inputs (either decorator must be specified or assay_pattern)
# if neither is specified we default to using decorators!
decorator = aux.from_kwargs("decorator", None, kwargs)
assay_pattern = aux.from_kwargs("assay_pattern", None, kwargs)
# pass kwargs to Parser for setup
self._prep_parser(kwargs)
# set up parsing
if decorator is not None or assay_pattern is None:
self._parse_by_decorators(**kwargs)
elif assay_pattern is not None:
self._parse_by_pattern(kwargs, assay_pattern)
else:
e = aw.MultiReaderError("no_decorator_or_pattern")
logger.error(e)
[docs] def make_Assays(self):
"""
Convert all found assays and normalisers into ``qpcr.Assay`` objects.
"""
# convert assays to qpcr.Assay and overwrite current dict by new list
new_assays = []
for name, df in self._assays.items():
new_assay = self._make_new_Assay(name, df)
new_assays.append(new_assay)
self._assays = new_assays
# do the same for normalisers
new_normalisers = []
for name, df in self._normalisers.items():
new_assay = self._make_new_Assay(name, df)
new_normalisers.append(new_assay)
self._normalisers = new_normalisers
[docs] def pipe(self, filename: str, **kwargs):
"""
A wrapper for read+parse+make_Assays
Note
----
This is the suggested use of ``MultiReader``.
If a directory has been specified into which the datafiles shall be saved,
then saving will automatically be done.
Parameters
-------
filename : str
A filepath to an input datafile.
**kwargs
Any additional keyword argument that will be passed to any of the wrapped methods.
Returns
-------
data : tuple
A tuple of the found assays-of-interst (first element) and normaliser-assays (second element).
"""
# clear previously read data
self.clear()
# read new data
try:
self.read(filename, **kwargs)
except Exception as e:
logger.debug(e)
self.read(filename)
e = aw.ParserError("incompatible_read_kwargs", func=f"{type(self._Parser).__name__}'s read method")
logger.error(e)
# parse and make assays
self.parse(**kwargs)
self.make_Assays()
# return new data
assays = self.get(which="assays")
normalisers = self.get(which="normalisers")
return assays, normalisers
[docs] def save_to(self, location: str = None):
"""
Sets the location into which the individual assay datafiles should be saved.
Parameters
----------
location : str
The path to a directory where the newly generated assay datafiles shall be saved.
If this directory does not yet exist, it will be automatically made.
"""
if location is not None:
self._save_loc = location
if not os.path.exists(self._save_loc):
os.mkdir(self._save_loc)
return self._save_loc
def __dreader__(self, **kwargs):
"""
The DataReader interacting method
"""
replicates = aux.from_kwargs("replicates", None, kwargs, rm=True)
self.replicates(replicates)
names = aux.from_kwargs("names", None, kwargs, rm=True)
self.names(names)
data = self.pipe(**kwargs)
return data
def _get_from_which(self, dataset, which):
"""
The core for assayS() and normalisers() to get either all or a specific one
"""
if which is not None:
if aux.same_type(dataset, {}):
assay = dataset[which]
else:
assay = [i for i in dataset if i.id() == which][0]
return assay
else:
if aux.same_type(dataset, {}):
names = dataset.keys()
assays = dataset.values()
else:
names = [i.id() for i in dataset]
assays = dataset
return assays, names
def _parse_by_pattern(self, kwargs, assay_pattern):
"""
Parses the file only based on assay_pattern.
Note this will also work if a decorator has been specified additionally.
"""
self._Parser.parse(assay_pattern=assay_pattern, **kwargs)
assays = self._Parser.get()
self._assays = assays
def _parse_by_decorators(self, **kwargs):
"""
Parses the file and idenifies assays and normalisers
based on decorators
"""
aux.from_kwargs("decorator", None, kwargs, rm=True)
# get assays-of-interest
self._Parser.parse(decorator="qpcr:assay", **kwargs)
assays = self._Parser.get()
self._assays = assays
# save extracted files if so desired...
if self.save_to() is not None:
self._Parser.save()
# clear results and run again for normalisers
self._Parser.clear()
# get normaliser-assays
self._Parser.parse(decorator="qpcr:normaliser", **kwargs)
normalisers = self._Parser.get()
self._normalisers = normalisers
if self.save_to() is not None:
self._Parser.save()
def _prep_parser(self, kwargs):
"""
Passes kwargs to Parser and performs additional setup
"""
# setup assay_patterns if they were provided
assay_pattern = aux.from_kwargs("assay_pattern", None, kwargs, rm=True)
self._Parser.assay_pattern(assay_pattern)
# check if file should be read transposed
transpose = aux.from_kwargs("transpose", False, kwargs, rm=True)
if transpose:
self._Parser.transpose()
# get data column labels
id_label = aux.from_kwargs("id_label", "Name", kwargs, rm=True)
ct_label = aux.from_kwargs("ct_label", "Ct", kwargs, rm=True)
self._Parser.labels(id_label, ct_label)
[docs]class MultiSheetReader(MultiReader):
"""
Reads a single multi-assay datafile and reads assays-of-interest and normaliser-assays based on decorators.
Input Data Files
Valid input files are multi-assay irregular ``excel`` files,
that specify assays by one replicate identifier column and one Ct value column.
Separate assay tables may be either below one another (separated by blank lines!)
or besides one another (requires ``transpose = True``), but may be in DIFFERENT sheets.
All assays from all sheets will be read!
Assays of interest and normaliser assays *must* be marked using ``decorators``.
Parameters
----------
filename : str
A filepath to a raw data file, containing multiple assays that were decorated.
Check out the documentation of the ``qpcr.Parsers``'s to learn more about decorators.
**kwargs
"""
def __init__(self):
super().__init__()
[docs] def read(self, *args, **kwargs):
"""
The ``MultiSheetReader`` **only** offers a ``pipe`` method!
Hence, neither ``read`` nor `parse` will work directly!
"""
print("Sorry, the MultiSheetReader can currently only be used, through it's pipe() method!")
[docs] def parse(self, *args, **kwargs):
"""
The ``MultiSheetReader`` **only** offers a ``pipe`` method!
Hence, neither ``read`` nor ``parse`` will work directly!
"""
print("Sorry, the MultiSheetReader can currently only be used, through it's pipe() method!")
[docs] def pipe(self, filename: str, **kwargs):
"""
Reads a multi-assay and multi-sheet datafile with decorated assays.
Any non-decorated assays are ignored!
Parameters
----------
filename : str
A filepath to a raw data file, containing multiple assays that were decorated.
Check out the documentation of the ``qpcr.Parsers``'s to learn more about decorators.
**kwargs
Any additional keyword arguments that should be passed to the ``qpcr.Parsers``''s ``read`` method that extracts the datasets.
Returns
-------
assays : dict or list
Returns either the raw dictionary of dataframes returned by the Parser (if no qpcr.Assays could be made automatically)
or a list of ``qpcr.Assay`` objects.
normalisers : dict or list
Returns either the raw dictionary of dataframes returned by the Parser
(if no qpcr.Assays could be made automatically)
or a list of ``qpcr.Assay`` objects.
"""
self._src = filename
# read file to get all sheets
sheets = pd.read_excel(filename, sheet_name=None)
all_assays = {}
all_normalisers = {}
# now repetitively read all sheets and extract data
reader = MultiReader()
for sheet in sheets.keys():
try:
# read file and parse data
kws = deepcopy(kwargs)
reader.read(filename, sheet_name=sheet)
reader.parse(ignore_empty=True, **kws)
# get assays
assays, normalisers = reader.get("assays"), reader.get("normalisers")
all_assays.update(assays)
all_normalisers.update(normalisers)
except Exception as e:
# ERROR HERE
_e = aw.MultiSheetReaderError("sheet_unreadable", sheet=sheets, e=e)
logger.error(_e)
# store data
self._assays = all_assays
self._normalisers = all_normalisers
# try making Assays directly, return dictioanries if not possible...
try:
self.make_Assays()
except Exception as e:
print(e)
assays, normalisers = self._assays, self._normalisers
return assays, normalisers
def __dreader__(self, **kwargs):
"""
The DataReader interacting method
"""
replicates = aux.from_kwargs("replicates", None, kwargs, rm=True)
self.replicates(replicates)
names = aux.from_kwargs("names", None, kwargs, rm=True)
self.names(names)
data = self.pipe(**kwargs)
return data
[docs]class BigTableReader(MultiReader):
"""
Reads a single multi-assay datafile and reads assays-of-interest and normaliser-assays based on decorators.
Input Data Files
Valid input files are multi-assay irregular ``csv`` or ``excel`` files,
that specify assays as one big table containing all information together.
Note that this implies that the entire data is stored in a single sheet (if using ``excel`` files).
Three possible data architectures are allowed:
- ``Vertical`` Big Tables
Big Tables of this kind require three columns (any additional columns are disregarded):
one specifying the assay, one specifying the replicate identifiers, and one specifying the Ct values.
An additional fourth column (`@qpcr`) may be filled with decorators but this is not necessary in this setup.
- ``Horizontal`` Big Tables
Big Tables of this kind store replicates from assays in side-by-side columns.
The replicates may be labelled numerically or all have the same column header.
A second column is required specifying the replicate identifier.
Note, this kind of setup *requires* decorators above the first replicate of each assay,
as well as user-defined ``replicates``!
Note, the column headers have to be **unique** to the table!
Also, a word of warning with regard to replicate *assays*. The entries in the ``assay`` defining column *must* be unique! If you have multiple assays from the same gene which therefore also have the same id they will be interpreted as belonging together and will be assembled into the same ``qpcr.Assay`` object. However, this will result in differently sized *Assays* which will cause problems downstream when you (or a ``qpcr.Normaliser``) try to assemble a `qpcr.Results` object!
- ``Hybrid`` Big Tables
Big Tables of this kind store Ct values of different assays in separate side-by-side columns,
but they store the replicate identifiers as a separate column. Hence, they combine aspects of vertical and horizontal Big Tables.
Note, two options exist to read this kind of setup.
- A ``list`` of ``ct_col`` values can be passed which contains the column header of each assay.
- The table can be `decorated`, in which case only decorated assays (columns) are extracted.
Please, note that the two methods of reading this table are mutually exclusive! So,
if you decorate your table you cannot pass specific assay headers to the ``ct_col`` argument anymore.
"""
__slots__ = ["_kind", "_id_col", "_ct_col", "_assay_col", "_is_regular", "_hybrid_decorated"]
def __init__(self):
super().__init__()
self._assays = {}
self._normalisers = {}
self._data = None
self._Parser = None
self._kind = None # horizontal or vertical
self._id_col = None
self._ct_col = None
self._assay_col = None
self._is_regular = False # store if the datafile was regular and does not
# have to be converted to a dataframe based on a numpy array...
self._hybrid_decorated = False # because hybrid bigtables require decorator input separately for both read and parse, we store the info so it only needs to be passed during read...
[docs] def pipe(self, filename: str, kind: str, id_col: str, **kwargs):
"""
A wrapper for read+parse+make_Assays
Note
-------
This is the suggested use of the ``BigTableReader``.
Parameters
----------
filename : str
A filepath to a raw data file, containing multiple assays that were decorated.
Check out the documentation of the ``qpcr.Parsers``'s to learn more about decorators.
kind : str
Specifies the kind of Big Table from the file.
This may either be ``"horizontal"``, ``"vertical"``, or ``"hybrid"``.
id_col : str
The column header specifying the replicate identifiers
(or "assays" in case of ``horizontal`` big tables).
**kwargs
Any additional columns or keyword arguments.
Returns
-------
assays : dict or list
Returns either the raw dictionary of dataframes returned by the Parser (if no qpcr.Assays could be made automatically)
or a list of ``qpcr.Assay`` objects.
normalisers : dict or list
Returns either the raw dictionary of dataframes returned by the Parser
(if no qpcr.Assays could be made automatically)
or a list of ``qpcr.Assay`` objects.
"""
replicates = aux.from_kwargs("replicates", None, kwargs)
names = aux.from_kwargs("names", None, kwargs, rm=True)
self.read(filename=filename, kind=kind, id_col=id_col, **kwargs)
self.parse(**kwargs)
self.replicates(replicates)
self.names(names)
self.make_Assays()
assays, normalisers = self.get("assays"), self.get("normalisers")
return assays, normalisers
[docs] def read(self, filename: str, kind: str, id_col: str, **kwargs):
"""
Reads a regular or irregular ``csv`` or ``excel`` datafile that contains data stored
in a single big table. Files are first tried to be read regularly, if this fails,
the Reader resorts to parsing to identify the relevant sections of the data.
Parameters
----------
filename : str
A filepath to a raw data file, containing multiple assays that were decorated.
Check out the documentation of the ``qpcr.Parsers``'s to learn more about decorators.
kind : str
Specifies the kind of Big Table from the file.
This may either be ``"horizontal"``, ``"vertical"``, or ``"hybrid"``.
id_col : str
The column header specifying the replicate identifiers
(or "assays" in case of ``horizontal`` big tables).
**kwargs
Any additional columns or keyword arguments.
"""
self._src = filename
self._kind = kind
is_horizontal = self._kind == "horizontal"
self._id_col = id_col
self._ct_col = aux.from_kwargs("ct_col", None, kwargs, rm=True)
self._assay_col = aux.from_kwargs("assay_col", None, kwargs, rm=True)
if not is_horizontal:
# first try default pd.read_csv or read_excel
self._try_simple_read(**kwargs)
# check if we got data, and abort if so
if self._data is not None:
self._df = self._data
self._is_regular = True
return
# if haven't got data, then we go to parsing...
# set is_horizontal to True for hybrid
# tables that are decorated
if self._kind == "hybrid" and aux.from_kwargs("decorator", False, kwargs):
is_horizontal = True
self._hybrid_decorated = True
# setup Parser to get data
self._Parser = Parsers.CsvParser() if self._filesuffix() == "csv" else Parsers.ExcelParser()
self._Parser.read(self._src, **kwargs)
self._Parser.labels(id_label=self._id_col)
self._Parser._make_BigTable_range(is_horizontal=is_horizontal)
self._data = self._Parser._bigtable_range
[docs] def parse(self, **kwargs):
"""
Parses the big table and extracts the individual assays.
"""
if self._kind == "vertical":
self._parse_vertical(**kwargs)
elif self._kind == "horizontal":
self._parse_horizontal(**kwargs)
elif self._kind == "hybrid":
decorator = aux.from_kwargs("decorator", self._hybrid_decorated, kwargs, rm=True)
self._parse_hybrid(decorator=decorator, **kwargs)
def _parse_hybrid(self, **kwargs):
"""
Extracts assay datasets for hybrid big tables,
it gets first the id_col and then all ct_cols,
and assembles new dfs of them into a dict
"""
# first check the kind of data we got because it
# can either be a pandas dataframe or a numpy ndarray
# depending on whether or not the file is regular and/or decorated
# it's a dataframe if it's a "regular" big table
if isinstance(self._data, pd.DataFrame):
# check if we got ct cols to extract
if self._ct_col is None:
e = aw.BigTableReaderError("no_ct_cols")
logger.critical(e)
SystemExit(e)
# we got a nice dataframe with columns in to extract directly
data = self._extract_from_hybrid_dataframe(data=self._data, to_extract=self._ct_col)
# since in this setting the data was not decorated,
# we store all datasets into the assays
self._assays = data
# it's a numpy ndarray if it's an "irregular" big table
else:
# check if the file is supposed to be decorated
decorator = aux.from_kwargs("decorator", False, kwargs)
if decorator:
# in case it is decorated we extract
# data based on the decorators
self._hybrid_bigtable_extract_by_decorator()
# if we dont have decorated data, we must have received a list for
# ct_col values to use. So we can just transform the np.ndarray to a datafram
# and use the same approach as for "regular" big tables...
else:
# check if we got ct cols to extract
if self._ct_col is None:
e = aw.BigTableReaderError("no_ct_cols")
logger.critical(e)
SystemExit(e)
# transform the numpy array from the
# hybrid bigtable to a pandas dataframe
data = self._convert_hybrid_nparray_to_dataframe()
# we got a nice dataframe with columns in to extract directly
data = self._extract_from_hybrid_dataframe(data=data, to_extract=self._ct_col)
# since in this setting the data was not decorated,
# we store all datasets into the assays
self._assays = data
def _convert_hybrid_nparray_to_dataframe(self):
"""
Converts a numpy array of a hybrid bigtable
generated by one of the Parsers into a
pandas DataFrame. It returns the DataFrame
"""
# first get the row in which the id_col header is located
# this should be the second row because if it was the first,
# then we would have gotten a "regular" table anyway, so there
# must actually be some decorators present, but the user decided
# to ignore them. But to be save, we specifically search again.
starting_index = np.argwhere(self._data == self._id_col)
# and get the row index. We assume there is only one hit for the
# id_col since if it were otherwise, the Parsers would have noticed
starting_index = starting_index.reshape(starting_index.size)
starting_index = starting_index[0]
# and now we can assemble the data for the dataframe
names = self._data[starting_index, :]
data = self._data[(starting_index + 1) :, :]
data = pd.DataFrame(data, columns=names)
return data
def _generate_subset_dataframe(self, data, decorator):
"""
Generates a pandas DataFrame of a subset of columns
from a decorated hybrid big table.
Note
----
This is a downsized version of the ``qpcr.Parsers`` ``find_assays`` core.
Parameters
-------
data : np.ndarray
A numpy array to search in. The **first** row will be searched.
"""
# compile decorator to search for
decorator = Parsers.decorators[decorator]
decorator = re.compile(decorator)
# get first row of the data
array = data[0, :]
# set up an index array for the columns that match
indices = np.zeros(len(array))
# iterate over all entries in the first row
idx = 0
for entry in array:
match = decorator.search(entry)
if match is not None:
indices[idx] = 1
idx += 1
# get matching indices and reduce dimensionality
indices = np.argwhere(indices == 1)
indices = indices.reshape(indices.size)
# now also add the id_col column to the set of relevant indices
# for this we search in the second row, but with exact matching.
array = data[1, :]
id_col = np.argwhere(array == self._id_col)
id_col = id_col.reshape(id_col.size)
# and merge the id_col to the found indices
indices = np.concatenate((id_col, indices))
# now get the datframe relevant subset and convert into a DataFrame
# we get all rows except the first (since there are the decorators)
# and only the columns with matching indices.
names = data[1, indices]
df = data[2:, indices]
# convert to numeric
# we use the _convert_to_numeric method from the Parsers to do that
# we iterate over each column (except the first where the identifiers are stored)
# and convert all entries to numeric...
tmp_parser = Parsers.CsvParser()
for i in range(1, df.shape[1]):
df[:, i] = tmp_parser._convert_to_numeric("None (from BigTableReader)", df[:, i])
# convert to dataframe
df = pd.DataFrame(df, columns=names)
return df
def _extract_from_hybrid_dataframe(self, data, to_extract):
"""
Extracts datasets from a hybrid dataframe
Parameters
----------
data : pd.DataFrame
The dataframe to extract from. This needs to include both the id_col and all ct_cols.
to_extract : str or list
The column names to of Ct columns extract (the id_col is referenced from self._id_col )
Returns
-------
dfs : dict
A dictionary of all extracted assays (assay id as key, df as value).
"""
# check which columns we should extract
# and convert to list if we only have a single one
# just so we can use a loop uniformly
if not isinstance(to_extract, (list, tuple)):
to_extract = [to_extract]
# get the id_column
id_col = data[self._id_col]
# and convert to string (just to be sure)
id_col = id_col.astype(str)
# setup a dict for the assay dataframes
dfs = {}
# iterate over all assays to extract
for col in to_extract:
# get ct values
ct_col = data[col]
# convert to float (i.e. introduce nan were necessary)
# to that end we use the same approach as the Parsers .make_dataframes()
try:
ct_col = ct_col.astype(float)
except Exception as e:
logger.debug(e)
logger.debug("Could not convert to float directly. Trying to convert to string and then to float.")
ct_col = np.array(ct_col, dtype=str)
try:
ct_col = np.genfromtxt(ct_col)
except Exception as e:
logger.debug(e)
logger.debug("Could not convert to float after string conversion. Resorting to regex matching...")
faulties = np.argwhere([Parsers.float_pattern.match(i) is None for i in ct_col])
ct_col[faulties] = "nan"
ct_col = np.genfromtxt(ct_col)
# and assemble new dataframe
tmp = pd.DataFrame({raw_col_names[0]: id_col, raw_col_names[1]: ct_col}) # using qpcr default column headers
# and save dataframe
dfs[col] = tmp
# and store to data
return dfs
def _parse_horizontal(self, **kwargs):
"""
Extracts assay datasets for a horizontal big table
by first transforming it to a vertical one and then using
the vertical parse
"""
# transform data into vertical
self._data = self._Parser._infer_BigTable_groups(**kwargs)
# transform array into df
# and set _is_regular to True so _parse_vertical
# wont try to also convert to df
self._make_vertical_range_df()
self._is_regular = True
# and parse_vertical to get assays
ct_col = raw_col_names[1]
assay_col = default_dataset_header
self._id_col = raw_col_names[0]
self._parse_vertical(ct_col=ct_col, assay_col=assay_col, **kwargs)
def _parse_vertical(self, **kwargs):
"""
Extracts assay datasets for vertical big tables
"""
self._ct_col = aux.from_kwargs("ct_col", None, kwargs, rm=True)
self._assay_col = aux.from_kwargs("assay_col", None, kwargs, rm=True)
# in case of vertical tables, check if we got ct_col and assay_col
got_no_cols = self._ct_col is None or self._assay_col is None
if self._kind == "vertical" and got_no_cols:
e = aw.BigTableReaderError("no_cols", ct_col=self._ct_col, assay_col=self._assay_col)
logger.critical(e)
SystemExit(e)
# convert to pandas dataframe in case
# the data had to be parsed
if not self._is_regular:
self._make_vertical_range_df()
# and test if the ones we have are good
if not self._test_cols_are_good():
e = aw.BigTableReaderError("no_cols", ct_col=self._ct_col, assay_col=self._assay_col)
logger.critical(e)
SystemExit(e)
df = self._data
assay_col_header = self._assay_col
ct_col_header = self._ct_col
id_col_header = self._id_col
# get columns to include
cols_to_use = [id_col_header, ct_col_header]
# now read the separate assays and store in assays and normalisers
if self._vertical_decorated():
# cols_to_use.append("@qpcr")
self._get_vertical_assays_decorated(df, assay_col_header, ct_col_header, cols_to_use)
else:
self._get_vertical_assays_not_decorated(df, assay_col_header, ct_col_header, cols_to_use, self._assays)
def _get_vertical_assays_decorated(self, df, assay_col_header, ct_col_header, cols_to_use):
"""
Gets assays based on decorators, storing them in _assays and _normalisers
"""
# get assays
tmp = df.query("`@qpcr` == 'assay'")
self._get_vertical_assays_not_decorated(tmp, assay_col_header, ct_col_header, cols_to_use, self._assays)
# get normalisers
tmp = df.query("`@qpcr` == 'normaliser'")
self._get_vertical_assays_not_decorated(tmp, assay_col_header, ct_col_header, cols_to_use, self._normalisers)
def _get_vertical_assays_not_decorated(self, df, assay_col_header, ct_col_header, cols_to_use, store_in):
"""
Gets assays without a decorator, storing them all in self._assays...
"""
# get default names for the id and ct columns
to_defaults = {_from: _to for _from, _to in zip([self._id_col, self._ct_col], raw_col_names)}
# iterate over all assays
assays = set(df[assay_col_header])
for assay in assays:
# get the assay subset
subset = df.query(f"`{assay_col_header}` == '{assay}'")
subset = subset[cols_to_use]
# make Cts numeric
cts = subset[ct_col_header].to_numpy()
cts = np.genfromtxt(np.array(cts, dtype=str))
subset[ct_col_header] = cts
subset = subset.reset_index(drop=True)
# rename to defaults
subset = subset.rename(columns=to_defaults)
# save assay
store_in.update({assay: subset})
def _vertical_decorated(self):
"""
Checks if a vertical bigtable is decorated
"""
return "@qpcr" in self._data.columns
def _make_vertical_range_df(self):
"""
Converts the numpy array from the Parser
to a pandas dataframe in case of irregular vertical big tables.
"""
data = self._data
headers = data[0, :]
data = data[1:, :]
df = pd.DataFrame(data, columns=headers)
self._data = df
def _test_cols_are_good(self):
"""
Tests if specified ct and assay cols are valid
"""
cols = self._data.columns
all_good = self._ct_col in cols and self._assay_col in cols
return all_good
def _try_simple_read(self, **kwargs):
"""
Try default readings without parsing in case the file is a regular
csv or excel file (only works in case of vertical big tables).
"""
if self._filesuffix() == "csv":
delimiter = ";" if self._is_csv2() else ","
data = pd.read_csv(self._src, delimiter=delimiter)
else:
sheet_name = aux.from_kwargs("sheet_name", 0, kwargs, rm=True)
data = pd.read_excel(self._src, sheet_name=sheet_name)
# check if we got the data we looked for...
got_regular_data = self._id_col in data.columns
if got_regular_data:
self._data = data
def __dreader__(self, **kwargs):
"""
The DataReader interacting method
"""
# replicates = aux.from_kwargs("replicates", None, kwargs)
# self.replicates(replicates)
# names = aux.from_kwargs("names", None, kwargs, rm = True)
# self.names(names)
data = self.pipe(**kwargs)
return data
def _hybrid_bigtable_extract_by_decorator(self):
"""
Extracts assays and normalisers based
on decorators from a decorated hybrid big table.
"""
# find all assays first
data = self._generate_subset_dataframe(data=self._data, decorator="qpcr:assay")
# now that we have a nice dataframe
# we can select all columns that are not the id_col as our ct_cols of interest
# and use the same _extract_from_hybrid_dataframe as for the undecorated hybrid tables.
ct_cols = [i for i in data.columns if i != self._id_col]
data = self._extract_from_hybrid_dataframe(data=data, to_extract=ct_cols)
# and save to the assays
self._assays = data
# now repeat the same for the normalisers
data = self._generate_subset_dataframe(data=self._data, decorator="qpcr:normaliser")
ct_cols = [i for i in data.columns if i != self._id_col]
data = self._extract_from_hybrid_dataframe(data=data, to_extract=ct_cols)
# and save to the normalisers
self._normalisers = data
if __name__ == "__main__":
# multisheet_file = "/Users/NoahHK/Downloads/Corti IPSCs July 2019_decorated.xlsx"
# decorated_excel = "./__parser_data/excel 3.9.19_decorated.xlsx"
# reader = MultiReader()
# reader.read(decorated_excel, sheet_name = 1)
# reader.parse(decorator = True, ignore_empty = True, assay_pattern = "Rotor-Gene")
# reader.make_Assays()
# r = reader.get("assays")
# print(r[0].get())
# assert r is not None, "MultiReader failed somewhere..."
# reader = MultiSheetReader()
# reader.pipe(
# multisheet_file,
# # decorator = True,
# assay_pattern = "Rotor-Gene"
# )
# print(reader.get("Actin"))
# bigtable_horiztonal = "/Users/NoahHK/Downloads/Local_cohort_Adenoma_qPCR_rawdata_decorated.xlsx"
# bigtable_vertical = "/Users/NoahHK/Downloads/qPCR all plates.xlsx"
# reader = BigTableReader()
# reader.read(bigtable_vertical, kind = "vertical", id_col = "Individual")
# reader.parse(ct_col = "Ct", assay_col = "Gene")
# reader.make_Assays()
# r = reader.get("assays")
# print(r[0].get(), r[0].id())
# reader.clear()
# assays, normalisers = reader._DataReader(
# filename = bigtable_horiztonal,
# kind = "horizontal",
# id_col = "tissue_number",
# replicates = (3,4),
# names = ["Gapdh", "Sord1"]
# )
# r = normalisers
# print(r[0].get())
# reader = qpcr.DataReader()
# r = reader.read( "./Examples/Example Data/actin.xlsx", header = 0, replicates = None, id = "myActin")
# # r = reader.make_Assay()
# print(r.get(), r.id())
# reader.read( "./Examples/Example Data/actin_nan.csv", replicates = 6, id_label = "Hii", id = "myActin", is_regular = True )
# r = reader.make_Assay()
# print(r.get(), r.id())
hybrid_bigtable = "./Examples/Example Data/Big Table Files/hybrid_bigtable.xlsx"
# sheet 0 is not decorated
# sheet 1 is decorated
reader = BigTableReader()
reader.read(
hybrid_bigtable,
kind="hybrid",
id_col="group",
# ct_col = ["TLR1", "TLR4", "GAPDH"],
decorator=True,
sheet_name=1,
)
reader.parse()
reader.make_Assays()
r = reader.assays()
print(r)
print(r[0][0].get())
print(r[0][0].get().dtypes)
r = reader.normalisers()
print(r)