Source code for qpcr.main.Normaliser

"""
This is the ``qpcr.Normaliser`` class whose function is to compute Fold-Changes of Delta-Ct values
from assays and normalisers using ``qpcr.Assay`` objects.


Modes of *normalisation*
========================

The ``qpcr.Normaliser`` supports custom functions for normalisation. However, it also comes with three built-in methods to normalise sample-assays against normalisers.
These are accessible via the `mode` argument of the `qpcr.Normaliser.normalise` method, which can be set to ``"pair-wise"`` (default), ``"combinatoric"``, or ``"permutative"``. 

The default option `"pair-wise"` is computationally the fastest and will rigidly normalise replicates against their corresponding partner from the normaliser. I.e. first against first, 
second against second, etc. This mode is appropriate for multiplex qPCR experiments. 

For qPCR reactions that were pipetted individually, there is not reason to strictly only pair first
with first, second with second etc. For these cases there are other two options `"combinatoric"` and ``"permutative"``. ``"combinatoric"`` normalisation will calculate all possible group-wise 
combinations of a sample-assay replicate with all available normaliser replicates of the same group. I.e. first against first, and against second, etc. This will generate :math:`n^2` values where 
:math:`n` is the number of replicates within a group. This mode is appropriate for small-scale datasets but will substantially increase computation times for larger datasets.   

``"permutative"`` on the other hand will reflect the equivalence of replicates within a group through random permutations wihtin the normaliser replicates. Hence, 
first may be normalised against first, or second, etc. This normalisation method may by used iteratively to increase the accuracy. Replacement during permutations are allowed (although disabled by default).
If replacement is desired by the user, the probability of each replicate to be chosen will be weighted based on a fitted normal distribution. This method is appropriate for larger datasets for which combinatoric normalisation
is not desired. 


Normalising data
================

Setting up a ``qpcr.Normaliser`` is really easy and we see it in virtually every GitHub tutorial.

.. code-block::
    
    normaliser = qpcr.Normaliser()

    # and now directly pipe the data through
    results = normaliser.pipe( some_assays, some_normalisers )

Alternatively we can also directly use the ``qpcr.normalise`` function that will call on a Normaliser for us.

.. code-block::
    
    results = qpcr.normalise( some_assays, some_normalisers )


Preprocessing "normalisers"
---------------------------

The ``qpcr.Normaliser`` can work with multiple ``qpcr.Assay`` as "normaliser-assays". However, it requires for computation a single set of numbers to compute fold changes.
Therefore, the Normaliser first performs some *pre-processing* of all normaliser assays it receives. By default it will **compute their mean** and use this to compute fold changes.
However, the ``qpcr.Normaliser`` is equipped with a method ``prep_func`` which allows you to pass a custom function for preprocessing.

Normalisation
-------------

By default the ``qpcr.Normaliser`` will compute normalised fold changes by dividing the assays-of-interst by the pre-processed normaliser. 
As a side note here, the ``qpcr.Analyser`` already stores the Delta-Ct values it computes as :math:`efficiency^{-\Delta Ct}`. 
Hence, by default the Delta-Ct values stored by ``qpcr.Assay``s after having been "analysed" are exponentials.
However, using the method ``norm_func`` also a custom normalisation function can be specified.

"""

import qpcr.defaults as defaults
import qpcr._auxiliary.warnings as aw
import qpcr._auxiliary as aux

from qpcr.main.Assay import Assay
from qpcr.main.Results import Results

import pandas as pd
import numpy as np
import scipy.stats as scistats

from copy import deepcopy

logger = aux.default_logger()


[docs]class Normaliser(aux._ID): """ Handles the second step in Delta-Delta-Ct (normalisation against normaliser assays). Note ----- This requires that all have been analysed in the same way before! """ __slots__ = ["_Normalisers", "_Assays", "_Results", "_normaliser", "_prep_func", "_norm_func", "_norm_func_is_set"] def __init__(self): super().__init__() self._Normalisers = [] self._Assays = [] self._Results = Results() # the actually used normaliser # which will be a pre-processed version # from all supplied normalsier assays # by default this will be the averaged version # of all normaliser assays (but this can be changed) self._normaliser = None # setup defaults self._prep_func = self._preprocess_normalisers self._norm_func = self._divide_by_normaliser # store the state if a custom norm func was provided self._norm_func_is_set = False def __str__(self): s = f""" {self.__class__.__name__}: \t{self.id()} Prep.Function: \t{self._prep_func.__name__} Prep.Function: \t{self._norm_func.__name__} """.strip() if not self._Results.is_empty: _length = len(str(self._Results._df).split("\n")[0]) s = f"{'-' * _length}\n{s}\n{'-' * _length}\n{self._Results._df}\n{'-' * _length}" return s def __repr__(self): prep = self.prep_func.__name__ norm = self.norm_func.__name__ return f"{self.__class__.__name__}({prep=}, {norm=})"
[docs] def clear(self): """ Will clear the presently stored results """ self._Results = Results()
[docs] def prune(self, assays=True, normalisers=True, results=True): """ Will clear assays, normalisers, and/or results Parameters ---------- assays : bool Will clear any sample assays if True (default). results : bool Will clear any computed results if True (default). normalisers : bool Will clear any normalisers if True (default). """ if assays: self._Assays = [] if normalisers: self._Normalisers = [] self._normaliser = None if results: self.clear()
[docs] def get(self, copy=False): """ Parameters ---------- copy : bool Will return a deepcopy of the Results object if `copy = True` (default is `copy = False`). Returns ------- Results : qpcr.Results A `qpcr.Results` object containing the normalised dataframe """ if copy: return deepcopy(self._Results) return self._Results
[docs] def prep_func(self, f=None): """ Sets any defined function for combined normaliser pre-processing. If no `f` is provided, it returns the current `prep_func`. Parameters ---------- f : function The function may accept one list of `qpcr.Assay` objects, and must return either an `qpcr.Assay` object directly or a `pandas.Dataframe` (that will be migrated to an `qpcr.Assay`). The returned dataframe must contain a `"dCt"` column which stores the delta-Ct values ultimately used as "normaliser assay". """ if aux.same_type(f, aux.fileID): self._prep_func = f elif f is None: return f else: aw.HardWarning("Normaliser:cannot_set_prep_func", func=f)
[docs] def norm_func(self, f=None): """ Sets any defined function to perform normalisation of assays against normalisers. If no `f` is provided, it returns the current `norm_func`. Parameters ---------- f : function The function may accept two `qpcr.Assay` objects (named `assay` and `normaliser` which will be forwarded from the `qpcr.Normaliser`). The function may also accept one `pandas.DataFrame` (named `df`) containing two numeric columns of delta-Ct values from a sample-assay (named "s") and a normaliser-assay (named "n"), as well as a group identifier column (named "group"). Whatever inputs it works with, it must return a named numeric `pandas.Series` of the same length as entries in the Assays' dataframes. ##### Note Support for the dataframe direct usage will be dropped at some point in the future. By default `s/n` is used, where `s` is a column of sample-assay deltaCt values, and `n` is the corresponding `"dCt"` column from the normaliser. """ if aux.same_type(f, aux.fileID): self._norm_func = f self._norm_func_is_set = True elif f is None: return f else: aw.HardWarning("Normaliser:cannot_set_norm_func", func=f)
[docs] def normalise(self, mode="pair-wise", **kwargs): """ Normalises all linked assays against the combined pseudo-normaliser (by default, unless a custom `prep_func` has been specified), and stores the results in a new Results object. Parameters ---------- mode : str The normalisation mode to use. This can be either `pair-wise` (default), or `combinatoric`, or `permutative`. `pair-wise` will normalise replicates only by their partner (i.e. first against first, second by second, etc.). `combinatoric` will normalise all possible combinations of a replicate with all partner replicates of the same group from a normaliser (i.e. first against first, then second, then third, etc.). This will generate `n^2` normalised Delta-Delta-Ct values, where `n` is the number of replicates in a group. `permutative` will scramble the normaliser replicates randomly and then normalise pair-wise. This mode supports a parameter `k` which specifies the times this process should be repeated, thus generating `n * k` normalised Delta-Delta-Ct values. Also, through setting `replace = True` replacement may be allowed during normaliser scrambling. Note, this setting will be ignored if a custom `norm_func` is provided. **kwargs Any additional keyword arguments that may be passed to a custom `norm_func` and `prep_func` (both will receive the kwargs!). """ if self._normaliser is None: self._normaliser = self._prep_func(self._Normalisers, **kwargs) self._vet_normaliser() if self._Assays == [] or self._normaliser is None: e = aw.NormaliserError("no_data_yet") logger.error(e) # check which kind of norm_func we should use if not self._norm_func_is_set: # pair-wise is already default so we don't check... if mode == "combinatoric": self._norm_func = self._tile_normalise tiled = deepcopy(self._Assays[0]) tiled.tile() self._Results.setup_cols(tiled) del tiled elif mode == "permutative": self._norm_func = self._permutate_normalise n = aux.from_kwargs("k", 1, kwargs) tiled = deepcopy(self._Assays[0]) tiled.stack(n) self._Results.setup_cols(tiled) del tiled elif mode == "pair-wise": self._Results.setup_cols(self._Assays[0]) else: self._Results.setup_cols(self._Assays[0]) # perform normalisation for each assay for assay in self._Assays: # get data # assay_df = assay.get() # apply normalisation (delta-delta-Ct) normalised = self._norm_func_wrapper(assay=assay, normaliser=self._normaliser, **kwargs) # # store results in _Results # self._store_to_Results(assay, normalised) # and store results also in the Assay itself assay.add_ddCt(self._normaliser.id(), normalised) # and store to results self._Results.add_ddCt(assay)
[docs] def pipe(self, assays: list, normalisers: list, mode="pair-wise", **kwargs): """ A wrapper for `Normaliser.link` and `Normaliser.normalise` Parameters ---------- assays : list A list of `qpcr.Assay` objects. normalisers : list A list of `qpcr.Assay` objects. mode : str The normalisation mode to use. This can be either `pair-wise` (default), or `combinatoric`, or `permutative`. `pair-wise` will normalise replicates only by their partner (i.e. first against first, second by second, etc.). `combinatoric` will normalise all possible combinations of a replicate with all partner replicates of the same group from a normaliser (i.e. first against first, then second, then third, etc.). This will generate `n^2` normalised Delta-Delta-Ct values, where `n` is the number of replicates in a group. `permutative` will scramble the normaliser replicates randomly and then normalise pair-wise. This mode supports a parameter `k` which specifies the times this process should be repeated, thus generating `n * k` normalised Delta-Delta-Ct values. Also, through setting `replace = True` replacement may be allowed during normaliser scrambling. Note, this setting will be ignored if a custom `norm_func` is provided. **kwargs Any additional keyword arguments that may be passed to a custom `norm_func` and `prep_func` (both will receive the kwargs!). Returns ------- results : qpcr.Results A `qpcr.Results` object of the assembled results. """ self.link(assays=assays, normalisers=normalisers) self.normalise(mode=mode) return self.get()
def _vet_normaliser(self): """ Checks if the normaliser is already a qpcr.Assay object, and if not convert it to one. """ if not isinstance(self._normaliser, Assay): tmp = Assay(df=self._normaliser) tmp.id("combined_normaliser") self._normaliser = tmp def _store_to_Results(self, assay, normalised): """ Stores computed Delta-delta-ct (normalisation) into the _Results object. """ column_name = f"{assay.id()}_rel_{self._normaliser.id()}" normalised = normalised.rename(column_name) self._Results.add(normalised) def _norm_func_wrapper(self, assay, normaliser, **kwargs): """ The wrapper that will apply the _norm_func to the sample and normaliser dataframes and return a pandas series of normalised values """ # for double normalised we want the same columns as dct and norm... # FUTURE DROP HERE # In the future we will not be creating the tmp_df # directly anymore, but intead will only pass assay and normaliser. sample_dCt = assay.dCt groups = assay.groups(as_set=False) norm_dCt = normaliser.dCt tmp_df = pd.DataFrame(dict(group=groups, s=sample_dCt, n=norm_dCt)) results = self._norm_func(df=tmp_df, assay=assay, normaliser=normaliser, **kwargs) # this is the old call from before factoring out to Assays # dCt_col, norm_col = self._prep_columns(sample_assay, dCt_col, norm_col) # tmp_df = normaliser.join(sample_assay, lsuffix="_s") # # tmp_df = sample_assay.join(normaliser, rsuffix = "_n") # results = self._norm_func(tmp_df[[dCt_col, norm_col]], **kwargs) return results def _tile_normalise(self, assay, normaliser, **kwargs): """ Normalises assays and normalisers group wise, iteratively normalising each individual replicate against all replicates from the normaliser. This generates `n**2` normalised Delta-Delta-Ct values where `n` is the group size. """ # get the untiled data adf = assay.get() groups = assay.groups() ndf = normaliser.get() # get the column to draw data from col = aux.from_kwargs("col", "dCt", kwargs) # tile the assay assay.tile() # generate results array ddCts = np.zeros(len(assay.get())) # now compute ddCt idx = 0 for group in groups: a_dCt = adf.query(f"group == {group}")[col].to_numpy() n_dCt = ndf.query(f"group == {group}")[col].to_numpy() for a in a_dCt: for n in n_dCt: r = a / n try: ddCts[idx] = r except: break idx += 1 ddCts = pd.Series(ddCts) ddCts.name = "ddCt" return ddCts def _permutate_normalise(self, assay, normaliser, **kwargs): """ Scrambles randomly the normaliser's replicate values group-wise """ # get the untiled data adf = assay.get() groups = assay.groups() ndf = normaliser.get() # get the column to draw data from col = aux.from_kwargs("col", "dCt", kwargs) # stack the assay # get the number of permutations to perform n = aux.from_kwargs("k", 1, kwargs) assay.stack(n) # get replace argument for random choice replace = aux.from_kwargs("replace", False, kwargs) # generate results array ddCts = np.zeros(len(assay.get())) # now compute ddCt idx = 0 for group in groups: for i in range(n): a_dCt = adf.query(f"group == {group}")[col].to_numpy() n_dCt = ndf.query(f"group == {group}")[col].to_numpy() # randomly scramble the normaliser replicates np.random.seed(defaults.default_seed) if replace: # in case of replacement, we generate a normal # distribution from the replicate values and weigh # the random.choice with the given probabilities of the values # being chosen. Since the probabilities do not themselves correspond to 1 # we normalise against the limited available subset to generate a probabilities # array that sums up to 1. mu, sd = scistats.norm.fit(n_dCt) probs = scistats.norm.pdf(n_dCt, loc=mu, scale=sd) probs = probs / np.sum(probs) n_dCt = np.random.choice(n_dCt, size=n_dCt.size, replace=replace, p=probs) else: n_dCt = np.random.choice(n_dCt, size=n_dCt.size, replace=replace) length = a_dCt.size r = a_dCt / n_dCt try: ddCts[idx : idx + length] = r except Exception as e: logger.info(e) break idx += length ddCts = pd.Series(ddCts) ddCts.name = "ddCt" return ddCts def _divide_by_normaliser(self, df, **kwargs): """ Performs normalisation of sample s against normaliser n s and n are specified as two pandas dataframe columns Note, that the dataframe must ONLY contain these two columns, first the dCt sample, then the normaliser! (default _norm_func) """ s, n = df["s"], df["n"] # this is the old call from before factoring out to Assays # dCt_col, norm_col = df.columns # s, n = df[dCt_col], df[norm_col] return s / n def _link_assays(self, assays): """ Links any provided assays and checks their datatype in the process... """ if assays is not None: for assay in assays: # if aux.same_type(assay, Assay()): # for some reason the isinstance is not working again... # UPDATE: that's probably due to the way we import it without # the whole module stuff in front. Anyway, this one works and # it should be fine for our purposes although it's a bit hacky... if type(assay).__name__ == "Assay": if assay.id() in [i.id() for i in self._Assays]: return self._Assays.append(assay) else: e = aw.NormaliserError("unknown_data", s=assay) logger.error(e) def _link_normaliser(self, normalisers): """ Checks if normaliser is provided and has proper datatype to be added... """ if normalisers is not None: for normaliser in normalisers: # if aux.same_type(normaliser, Assay()): if type(normaliser).__name__ == "Assay": if normaliser.id() in [i.id() for i in self._Normalisers]: return self._Normalisers.append(normaliser) else: e = aw.NormaliserError("norm_unknown_data", s=normaliser) logger.error(e) def _preprocess_normalisers(self, *args, **kwargs): """ Averages the provided normalisers row-wise for all normalisers into a single combined normaliser, that will be stored as a new Assay object. """ # initialise new Results to store the dCt values form all normalisers combined = Results() # setup names using the first normaliser combined.setup_cols(self._Normalisers[0]) combined.adopt_id(self._Normalisers[0]) # now add all dCt columns from all normalisers for norm in self._Normalisers: combined.add_dCt(norm) # now generate the combined normaliser combined_normaliser = self._average(combined) combined_normaliser = combined_normaliser.rename("dCt") combined.add(combined_normaliser) logger.debug(combined) # now assemble the normaliser into a qpcr.Assay combined = combined.get() normaliser = Assay(df=combined, replicates=self._Normalisers[0].replicates(), group_names=self._Normalisers[0].names()) self._normaliser = normaliser self._update_combined_id() logger.debug(f"combined normaliser id: {self._normaliser.id()}") # forward combined_id to self and _Results self.adopt_id(self._normaliser) self._Results.adopt_id(self._normaliser) return self._normaliser def _update_combined_id(self): """ Generates a new id based on all normaliser ids, joining them as a+b+c,... """ ids = [N.id() for N in self._Normalisers] ids = "+".join(ids) self._normaliser.id_reset() self._normaliser.id(ids) def _average(self, combined): """ Averages row-wise all Normaliser entries and generates a series of their per-row means (default preprocess_normalisers function) """ tmp_df = combined.get() # drop group as it is a numeric column # and would otherwise skew the average if "group" in tmp_df.columns: tmp_df = tmp_df.drop(columns=["group"]) tmp_df = tmp_df.mean(axis=1, numeric_only=True) return tmp_df
__default_Normaliser__ = Normaliser() """The default Normaliser"""