"""
This is the ``qpcr.Assay`` class whose job is to store qPCR datasets. It is a central data-handling class in ``qpcr``.
Setting up a ``qpcr.Assay``
===========================
Here is a manual example of creating a ``qpcr.Assay`` object. You can use either the ``qpcr.DataReader`` or any one of :ref:`qpcr.Readers <qpcr.Readers>` directly to
read in your data and generate a pandas DataFrame. Note, the ``qpcr.Readers`` are already equipped with ``make_Assay(s)`` methods that will handle
setting up ``qpcr.Assay`` objects for you.
However, setting up a ``qpcr.Assay`` manually can be as simple as:
.. code-block:: python
# get the dataframe from one of the qpcr.Readers
mydata = some_reader.get()
assay = Assay( df = mydata, id = "my_assay" )
If your replicate identifiers are the same for all replicates within each group then the groups are automatically inferred. And your assay is
ready at this point already to be passed to an `qpcr.Analyser`. If not, you can specify the replicates manually like this:
.. code-block:: python
# manually specify triplicates during setup
assay = Assay( df = mydata, id = "my_assay", replicates = 3 )
# or you can change the replicates after initial setup like
assay = Assay( df = mydata, id = "my_assay" )
assay.group( replicates = 3 )
We can now actually interact with the `qpcr.Assay`. Assays support direct item setting, getting, and deleting on their dataframes.
.. code-block:: python
# we could for instance fill a new column with only ones
assay[ "my_new_column" ] = 1
# or get the id column from the assay
ids = assay[ "id" ]
Specifying (Groups of) Replicates
=================================
The groups are essential to analysing our data, so ``qpcr`` needs to know about how the data is grouped. By the way, if you are unfamiliar with "groups" check out this
Here's the best part: usually, we don't necessarily need to do anything here because ``qpcr.Assay`` are able to infer the groups of replicates in your data
automatically from the replicate identifiers (yeah!). However, you will be asked to manually provide replicate settings in case this fails.
In case you want to / have to manually specify replicate settings, a ``qpcr.Assay`` accepts an input ``replicates`` which is where you can specify this information.
This input can be either an ``integer``, a ``tuple``, or a ``string``. Why's that?
Well, normally we perform experiments as "triplicates", or "duplicates", or whatever multiplets.
Hence, if we always have the same number of replicates in each group (say all triplicates) we can simply specify this number as ``replicates = 3``.
However, some samples might only be done in unicates (such as the diluent sample), while others are triplicates.
In these cases your dataset does not have uniformly sized groups of replicates and a single number will not do to describe the groups of replicates.
For these cases you can specify the number of replicates in each group separately as a ``tuple`` such as ``replicates = (3,3,3,3,1)`` or as a ``string`` "formula"
which allows you to avoid repeating the same number of replicates many times like ``replicates = "3:4,1"``, which will translate into the same tuple as we specified manually.
"""
import pandas as pd
import numpy as np
import os
import qpcr.defaults as defaults
import qpcr._auxiliary as aux
import qpcr._auxiliary.warnings as aw
from copy import deepcopy
logger = aux.default_logger()
raw_col_names = defaults.raw_col_names
[docs]class Assay(aux._ID):
"""
The central storing unit of single datasets that were read from datafiles.
An `qpcr.Assay` stores the replicate identifiers and Ct values, and also
groups these according to the `replicates` information (which is automatically
inferred by default). Groups of replicates can be arbitrarily renamed by the user.
Note
-------
The new implementation of the `qpcr.Assay` works directly with a DataFrame
that was generated by any one of the `qpcr.Readers` or `qpcr.Parsers`.
Parameters
----------
df : pandas.DataFrame
A DataFrame produces by one of the `qpcr.Readers`
containing an `id` column for the replicate identifiers
and a `Ct` value column.
id : str
The identifer of the assays (the Assay name, essentially).
replicates : int or tuple or str
Can be an `integer` (equal group sizes, e.g. `3` for triplicates),
or a `tuple` (uneven group sizes, e.g. `(3,2,3)` if the second group is only a duplicate).
Another method to achieve the same thing is to specify a "formula" as a string of how to create a replicate tuple.
The allowed structure of such a formula is `n:m,` where `n` is the number of replicates in a group and `m` is the number of times
this pattern is repeated (if no `:m` is specified `:1` is assumed). See `qpcr.Assay.replicates` for an example.
group_names : list
A list of names to use for the replicates groups. If replicates of the same group share the same identifier, then the
group will be inferred automatically. Otherwise, default group names will be set if no `group_names` are provided.
"""
__slots__ = [
"_df",
"_id",
"_efficiency",
"_efficiency",
"_replicates",
"_group_names",
"_groups",
]
def __init__(
self,
df: pd.DataFrame,
id: str = None,
replicates: (int or tuple or str) = None,
group_names: list = None,
):
super().__init__()
if isinstance(df, pd.DataFrame):
self._df = df
else:
raise TypeError(
f"df argument must be a pandas DataFrame (got {type(df).__name__})"
)
if id is not None:
self._id = id
# get replicates
self._replicates = replicates
# store names
self._names = group_names
# setup the amplification efficiency
self._efficiency = 1.0
self._eff = 2 * self._efficiency
# now try to group the data
try:
self.replicates(self._replicates)
self.group()
except Exception as e:
raise aw.AssayError("setup_not_grouped")
# and try to change names, provided that we could group yet...
if self._names is not None and self.groups() is not None:
self.rename(self._names)
[docs] def efficiency(self, eff: float = None):
"""
Gets or sets the amplification efficiency of the Assay.
Parameters
----------
eff : float
A new efficiency to assign to the assay.
Returns
-------
float
The currently assigned efficiency.
"""
logger.debug(f"{eff=}")
if isinstance(eff, float):
self._efficiency = eff
self._eff = 2 * self._efficiency
logger.info(
f"New efficiency set to {self._efficiency} (computes as binary factor {self._efficiency} * 2 = {self._eff})"
)
elif eff is None:
return self._efficiency
else:
raise TypeError(
f"Expected a float efficiency but got type {type(eff).__name__}"
)
[docs] def save(self, filename: str):
"""
Saves the data from the `Assay` to a `csv` file.
Parameters
----------
filename : str
The filename into which the assay should be stored.
If this is a `directory`, then the assay `id` will automatically
be used as filename.
"""
if os.path.isdir(filename):
filename = os.path.join(filename, f"{self.id()}.csv")
self.to_csv(filename, index=False)
[docs] def get(self, copy: bool = False):
"""
Parameters
-------
copy : bool
If `True` returns a deepcopy of the stored dataframe.
Returns
-------
data : pandas.DataFrame
The stored dataframe
"""
if copy:
data = deepcopy(self._df)
else:
data = self._df
return data
[docs] def boxplot(self, mode: str = None, **kwargs):
"""
A shortcut to call a `qpcr.Plotters.ReplicateBoxPlot` plotter
to visualise the loaded replicates.
Parameters
-------
mode : str
The plotting mode. May be either "static" (matplotlib) or "interactive" (plotly).
**kwargs
Any additional keyword arguments to be passed to the plotter.
Returns
-------
fig : plt.figure or plotly.figure
The figure generated by `ReplicateBoxPlot`.
"""
import qpcr.Plotters as Plotters
plotter = Plotters.ReplicateBoxPlot(mode=mode)
plotter.link(self)
fig = plotter.plot(**kwargs)
return fig
def __qplot__(self, **kwargs):
return self.boxplot
[docs] def tile(self, n: int = 1):
"""
Expands the dataframe to the square number of entries for each group.
This is useful for combinatoric normalisation wherein each replicate is normalised
against each replicate group-wise from the normaliser, instead of only its supposed partner value.
Parameters
-------
n : int
The number of tiles to produce. By default `1 tile` will effectively *square* the number of entries within the dataframe.
"""
df = self._df
groups = self.groups()
new = None
for group in groups:
subset = df.query(f"group == {group}")
length = len(subset) * n
subset = pd.concat([subset for i in range(length)], ignore_index=True)
if new is None:
new = subset
else:
new = pd.concat([new, subset], ignore_index=True)
self.adopt(new)
return self
[docs] def stack(self, n: int = 2):
"""
Expands the dataframe entry-wise `n` times.
Parameters
-------
n : int
The number of stacks to produce. `1 stack` will introduce one more copy of each replicate.
Note, `n == 1` will keep the current entries!
"""
df = self.get()
groups = self.groups()
n = int(n)
new = None
if n > 1:
for group in groups:
subset = df.query(f"group == {group}")
length = n
subset = pd.concat([subset for i in range(length)], ignore_index=True)
if new is None:
new = subset
else:
new = pd.concat([new, subset], ignore_index=True)
self.adopt(new)
return self
@property
def Ct(self):
"""
Returns
------
Ct : pandas.Series
A pandas Series with the assay's Ct values. The column is renamed
from "Ct" to the assay's `id`.
"""
Ct = self._df[raw_col_names[1]]
Ct.name = f"{self.id()}_Ct"
return Ct
@Ct.setter
def Ct(self, Ct):
"""
Sets the Ct values of the Assay.
"""
self._df[raw_col_names[1]] = Ct
@property
def dCt(self):
"""
Returns
-------
dCt : pandas.Series
A pandas Series with the computed Delta-Ct values. The column is renamed
from "dCt" to the assay's `id`.
"""
dCt = self._df["dCt"]
dCt.name = f"{self.id()}_dCt"
return dCt
@dCt.setter
def dCt(self, dCt):
"""
Sets the Delta-Ct values of the Assay.
"""
self._df["dCt"] = dCt
@property
def ddCt(self):
"""
Returns
-------
ddCt : pandas.DataFrame
A pandas DataFrame with all Delta-Delta-Ct values that the Assay has stored.
All `"rel_{}"` columns are renamed to include the assay `id` to `"{id}_rel_{}"`.
"""
# get all ddCt columns
ddCt = [i for i in self._df.columns if "rel_" in i]
id = self._id
# make new names and generate renaming dictionary
new_names = [f"{id}_{i}" for i in ddCt]
new_names = {old: new for new, old in zip(new_names, ddCt)}
# get the data and rename
ddCt = self._df[ddCt]
if not isinstance(ddCt, pd.DataFrame):
ddCt = pd.DataFrame(ddCt)
ddCt = ddCt.rename(columns=new_names)
ddCt.name = f"{self.id()}_ddCt"
return ddCt
@property
def ddCt_cols(self):
"""
Returns
-------
cols
A list of all rel_{} columns within the Assays's dataframe.
"""
return [i for i in self._df.columns if "rel_" in i]
# FUTURE FEATURE HERE
# def fc(self):
# some method to also return the fold change columns...
@property
def data_cols(self):
"""
Returns
-------
cols
A list of all non-setup columns in the dataframe.
"""
return [i for i in self._df.columns if not i in defaults.setup_cols]
@property
def columns(self):
return self._df.columns
[docs] def rename_cols(self, cols: dict):
"""
Renames columns according to a dictionary as key -> value.
Parameters
----------
cols : dict
A dictionary specifying old column names (keys) and new colums names (values).
"""
self._df = self._df.rename(columns=cols)
[docs] def n(self):
"""
Returns
------
int
The number of entries (individual replicates) within the Assay.
"""
return len(self._df) # self._length
[docs] def add_dCt(self, dCt: pd.Series):
"""
Adds results from Delta-Ct (first Delta-Ct performed by a `qpcr.Analyser`).
Parameters
-----------
dCt : pandas.Series
A pandas Series of Delta-Ct values that will be stored in a column `"dCt"`.
Note, that each `Assay` can, of course, only store one single Delta-Ct column.
"""
self._df["dCt"] = dCt
[docs] def add_ddCt(self, normaliser_id: str, ddCt: pd.Series):
"""
Adds results from Delta-Delta-Ct ("normalisation" performed by a `qpcr.Normaliser`).
These will be stored in a column named `"rel_{normaliser_id}"`. Hence, an Assay can store
an arbitrary number of Delta-Delta-Ct columns against an arbitrary number of different normalisers.
Parameters
----------
normaliser_id : str
The id of the normaliser Assay used to compute the Delta-Delta-Ct values.
ddCt : pandas.Series
A pandas Series of Delta-Delta-Ct values.
"""
name = f"rel_{normaliser_id}"
self._df[name] = ddCt
# FUTURE FEATURE HERE
# some method to add fc columns here...
[docs] def adopt(self, df: pd.DataFrame):
"""
Adopts an externally computed dataframe as its own.
This is supposed to be used when setting up new `qpcr.Assay` objects that do not
inherit data from one of the `qpcr.Readers`. If you wish to alter an existing `qpcr.Assay` use `force = True`.
When doing this, please, make sure to retain the proper data structure!
Parameters
----------
df : pd.DataFrame
A pandas DataFrame.
"""
self._df = df
[docs] def names(self, as_set=True):
"""
Parameters
----------
as_set : bool
If `as_set = True` (default) it returns a set (as list without duplicates)
of assigned group names for replicate groups.
If `as_set = False` it returns the full group_name column (including all repeated entries).
Returns
-------
names : list or pd.Series
The given group names of all replicate groups.
"""
if "group_name" in self._df.columns:
if as_set:
return list(self._df["group_name"].unique())
else:
return self._df["group_name"]
else:
logger.warning(aw.AssayError("no_groupnames"))
return None
[docs] def groups(self, as_set=True):
"""
Parameters
----------
as_set : bool
If `as_set = True` (default) it returns a set (as list without duplicates)
of assigned group names for replicate groups.
If `as_set = False` it returns the full group_name column (including all repeated entries).
Returns
-------
groups : list
The given numeric group identifiers of all replicate groups.
"""
if "group" in self._df.columns:
groups = list(self._df["group"].unique()) if as_set else self._df["group"]
return groups
else:
logger.warning(aw.AssayError("setup_not_grouped"))
return None
[docs] def replicates(self, replicates: (int or tuple or str) = None):
"""
Either sets or gets the replicates settings to be used for grouping
Before they are assigned, replicates are vetted to ensure they cover all data entries.
Parameters
----------
replicates : int or tuple or str
Can be an `integer` (equal group sizes, e.g. `3` for triplicates),
or a `tuple` (uneven group sizes, e.g. `(3,2,3)` if the second group is only a duplicate).
Another method to achieve the same thing is to specify a "formula" as a string of how to create a replicate tuple.
The allowed structure of such a formula is `n:m,` where `n` is the number of replicates in a group and `m` is the number of times
this pattern is repeated (if no `:m` is specified `:1` is assumed).
So, as an example, if there are 12 groups which are triplicates, but
at the end there is one which only has a single replicate (like the commonly measured diluent qPCR sample), we could either specify the tuple
individually as `replicates = (3,3,3,3,3,3,3,3,3,3,3,3,1)` or we use the formula to specify `replicates = "3:12,1"`. Of course, this works for
any arbitrary setting such as `"3:5,2:5,10,3:12"` (which specifies five triplicates, followed by two duplicates, a single decaplicate, and twelve triplicates again – truly a dataset from another dimension)...
"""
if replicates is not None and self._df is not None:
# convert a string formula to tuple if one was provided
if isinstance(replicates, str):
replicates = self._reps_from_formula(replicates)
# vet replicate coverage
if self._vet_replicates(replicates):
self._replicates = replicates
else:
logger.critical(
aw.AssayError(
"reps_dont_cover", n_samples=self.n(), reps=replicates
)
)
raise aw.AssayError(
"reps_dont_cover", n_samples=self.n(), reps=replicates
)
return self._replicates
[docs] def group(self, replicates: (int or tuple or str) = None, infer_names=True):
"""
Groups the data according to replicates-settings specified.
Parameters
----------
replicates : int or tuple or str
The replicate settings after which to group the `Assay`. This will just
get forwarded to the `replicates` method, so there is no need to specify replicates
here if the replicates method has already been called.
See the documentation of the `Assay.replicates` method for more details.
infer_names : bool
Try to infer names of replicate groups based on the individual replicate sample identifiers.
Note that this only works if all replicates have an identical sample name!
"""
if replicates is not None:
self.replicates(replicates)
# generate group and group_names columns
if isinstance(self._replicates, int):
groups, group_names = self._make_equal_groups()
elif isinstance(self._replicates, tuple):
groups, group_names = self._make_unequal_groups()
else:
if self._identically_named():
groups = self._infer_replicates()
group_names = [defaults.group_name.format(i) for i in groups]
else:
e = aw.AssayError("no_reps_inferred", assay=self.id())
logger.critical(e)
raise e
# add numeric group identifiers
self._df["group"] = groups
self._df["group_name"] = group_names
if infer_names: # and self._names is None:
# infer group names
self._infer_names()
return self
[docs] def rename(self, names: (list or dict)):
"""
Replaces the current names of the replicate groups
(stored in the "group_name" column).
Parameters
----------
names : list or dict
Either a `list` (new names without repetitions) or `dict` (key = old name, value = new name) specifying new group names.
Group names only need to be specified once, and are applied to all replicate entries.
"""
# get new group names based on list (index) or dict (key)
if isinstance(names, (list, tuple, set)):
new_names = self._rename_per_index(names)
elif isinstance(names, dict):
new_names = self._rename_per_key(names)
else:
logger.error(aw.AssayError("no_groupname_assignment", names=names))
return
# update "group_name"
self._df["group_name"] = new_names
self._renamed = True
return self
[docs] def ignore(self, entries: tuple, drop=False):
"""
Remove lines based on index from the dataframe.
This is useful when removing corrupted data entries.
Parameters
----------
entries : tuple
Tuple of row indices from the dataframe to drop.
drop : bool
If True the provided entries will be entirely removed from the
dataset. If False, ignore entries will be set to NaN.
"""
if drop:
self._df = self._df.drop(index=list(entries))
else:
Cts = np.array(self.Ct)
Cts[entries] = np.nan
self._df["Ct"] = Cts
return self
def _reps_from_formula(self, replicates):
"""
Generates a replicate tuple from a string formula.
See the docstring of `replicates()` for more info on the formula.
Example:
"3:4,1:4,2:3,9" -> (3, 3, 3, 3, 1, 1, 1, 1, 2, 2, 2, 9)
"""
# split the formula and adjust standard formatting
replicates = replicates.split(",")
replicates = [i + ":1" if ":" not in i else i for i in replicates]
# convert to numeric values and extend
replicates = [np.array(i.split(":"), dtype=int) for i in replicates]
replicates = [np.tile(i[0], i[1]) for i in replicates]
# generate replicate tuple
replicates = np.concatenate(replicates)
replicates = tuple(replicates)
return replicates
def _infer_replicates(self):
"""
Infers the replicate groups based on the replicate ids in case all replicates of the same group have the same name.
"""
names = self._df[raw_col_names[0]]
names_set = names.unique()
groups = [i for i in range(len(names_set))]
for name, group in zip(names_set, groups):
names = names.replace(name, group)
indices = np.array(names, dtype=int)
return indices
def _infer_names(self):
"""
Infers replicate group names from the given replicate identifier column
"""
if self._identically_named():
self._df["group_name"] = self._df[raw_col_names[0]]
elif self._names is None:
logger.warning(aw.AssayError("groupnames_not_inferred"))
def _identically_named(self):
"""
Checks if all replicates in the same group have the same name / id
It checks simply the first group, if that is identical then it's fine.
"""
if "group" not in self._df.columns:
names = self._df[raw_col_names[0]]
names_set = names.unique()
# names_set = aux.sorted_set(names)
first_name = names_set[0]
group0 = self._df.query(f"{raw_col_names[0]} == '{first_name}'")[
raw_col_names[0]
]
entries = len(group0)
all_identical = entries > 1
else:
group0 = self._df.query("group == 0")[raw_col_names[0]]
all_identical = all(group0 == group0[0])
return all_identical
def _rename_per_key(self, names):
"""
Generates new name list based on current names in "group_name" and uses string.replace()
to update groupnames, based on key (old name) : value (new name) indexing.
Before applying it checks if all groups are covered by new names
"""
current_names = self.names()
# current_names = aux.sorted_set(self._df["group_name"])
all_groups_covered = len(names) == len(current_names)
if all_groups_covered:
current_names = list(self._df["group_name"])
new_names = "$".join(current_names)
for old_name, new_name in names.items():
new_names = new_names.replace(old_name, new_name)
new_names = new_names.split("$")
return new_names
else:
e = aw.AssayError(
"groupnames_dont_colver",
current_groups=current_names,
new_received=names,
)
logger.critical(e)
raise e
def _rename_per_index(self, names):
"""
Generates new name list based on current names in "group_names" and uses string.replace()
to update groupnames to new names based on index (using a the order
of groups as is currently present in "group_name").
"""
current_names_set = self.names()
# current_names_set = aux.sorted_set(self._df["group_name"])
all_groups_covered = len(names) == len(current_names_set)
if all_groups_covered:
current_names = list(self._df["group_name"])
new_names = "$".join(current_names)
names = list(names)
for old_name, new_name in zip(current_names_set, names):
new_names = new_names.replace(old_name, new_name)
new_names = new_names.split("$")
return new_names
else:
e = aw.AssayError(
"groupnames_dont_colver",
current_groups=current_names,
new_received=names,
)
logger.critical(e)
raise e
def _make_unequal_groups(self):
"""
Returns two lists of [0,0,0,1,1,1] and
[Group0, Group0, Group0, Group1,...]
to cover all data entries.
(this function works with a tuple for replicate group sizes)
"""
groups = []
group_names = []
for rep, idx in zip(self._replicates, range(len(self._replicates))):
groups.extend([idx] * rep)
group_names.extend([defaults.group_name.format(idx)] * rep)
return groups, group_names
def _make_equal_groups(self):
"""
Returns two lists of [0,0,0,1,1,1] and
[Group0, Group0, Group0, Group1,...]
to cover all data entries.
(this function works with an integer group size,
assuming all groups have the same size)
"""
assays = self.n()
groups = []
group_names = []
slices = range(int(assays / self._replicates))
for i in slices:
groups.extend([i] * self._replicates)
group_names.extend([defaults.group_name.format(i)] * self._replicates)
return groups, group_names
def _vet_replicates(self, replicates: (int or tuple)):
"""
Checks if provided replicates will place all data entries into a group
returns True if all replicates are covered, False if not...
"""
current_entries = self.n()
verdict = None
# for INT -> modulo will be 0 if all replicates are covered
# for TUPLE -> sum(replicates) should cover all replicates...
if isinstance(replicates, int):
verdict = True if current_entries % replicates == 0 else False
elif isinstance(replicates, tuple):
verdict = True if sum(replicates) == current_entries else False
if verdict is None:
e = aw.AssayError("reps_could_not_vet", reps=replicates)
logger.error(e)
raise e
return verdict
def __str__(self):
_length = len(str(self._df).split("\n")[0])
s = f"""
{"-" * _length}
{self.__class__.__name__}: {self._id}
Amplif. Eff.: {self._efficiency}
{"-" * _length}
{self._df}
{"-" * _length}
""".strip()
return s
def __repr__(self):
id = self._id
eff = self._efficiency
n = len(self)
return f"{self.__class__.__name__}({id=}, {eff=}, {n=})"
def __iter__(self):
d = list(self._df.groupby("group"))
d = (i for _, i in d)
return d
def __len__(self):
return len(self._df)
def __setitem__(self, key, value):
self._df[key] = value
def __getitem__(self, key):
return self._df[key]
def __delitem__(self, key):
del self._df[key]