Working with “Big Table” Datafiles
This notebook gives an example how to read “Big Table” datafiles using qpcr.Readers
. It makes use of the provided example data in the Example Data
directory.
Experimental background
The corresponding experimental setup was as follows: Levels of Nonsense-mediated mRNA decay (NMD) sensitive (nmd) and insensitive (prot) transcript isoforms of HNRNPL and SRSF11 were measured by qPCR. As normalisers both 28S rRNA and Actin transcript levels were measured. The replicates are biological triplicates and technical douplicates. All measurements from the same qPCR sample were merged into hexaplicates (6 replicates). This was done in two separate HeLa cell lines (one with a specific gene knockout (KO), and one without (WT)), which were both treated to a plasmid-mediated rescue (+) or not (-), leading to four experimental conditions:
cell line \ condition |
rescue |
no rescue |
---|---|---|
knockout |
KO+ |
KO- |
wildtype |
WT+ |
WT- |
[1]:
# import the qpcr module
import qpcr
from qpcr.Readers import BigTableReader
1 - Reading a “vertical” Big Table File
1.1 Setting up the DataReader
First we set up the qpcr.DataReader
[14]:
# our single-assay datafile
file = "./Example Data/Big Table Files/vertical_bigtable_decorated.csv"
# set up the reader
reader = qpcr.DataReader()
1.2 Specify how to read the file
This datafile is a “vertical” Big Table file. That means it stores assays above each other and has an assay
, a sample
(groups), and Ct
column. In fact, this example datafile has nothing else in it, but reading a file that specifies other stuff would work just the same.
In order to read Big Table file we need to specify big_table = True
for the DataReader
or immediately use the qpcr.Readers.BigTableReader
.
The kind
parameter
One important parameter we have to specify is the architecture of the table. The table in our file is “vertical” so we need to specify kind = "vertical"
in order to tell the Reader
how to interpret the data it reads. The kind
parameter tells the BigTableReader
which parsing method to use to extract the individual assays. The other option here would correspondingly be kind = "horizontal"
.
Also, note how the filename already has “decorated” in it? That’s because the file already includes a @qpcr
decorator column specifying which assays are “assays-of-interest” and which ones are normalisers.
Like this we can read the file and immediately pass it on to our analysis.
[11]:
assays, normalisers = reader.read(
filename = file,
# specify the big table
big_table = True,
# specify that the data is decorated
decorator = True,
# specify the kind of big table
kind = "vertical",
# specify which columns store
# the relevant data
assay_col = "Assay",
id_col = "Name",
ct_col = "Ct"
)
# which yields
print( assays, normalisers )
[Assay(id='HNRNPL nmd', eff=1.0, n=24), Assay(id='HNRNPL prot', eff=1.0, n=24)] [Assay(id='28S', eff=1.0, n=24)]
Of course, we can, again, also use the BigTableReader
directly.
[15]:
reader = BigTableReader()
# read the file
assays, normalisers = reader.pipe(
filename = file,
# specify that the data is decorated
decorator = True,
# specify the kind of big table
kind = "vertical",
# specify which columns store
# the relevant data
assay_col = "Assay",
id_col = "Name",
ct_col = "Ct"
)
2 Reading a “horizontal” Big Table File
Reading a “horizontal” big table file is actually much more tricky than it’s vertical counterpart. Why’s that? Well, because in “horizontal” big table files the replicates are aligned in columns next to each other, and they usually have different column headers as well. That means we can no longer rely on a single column or even a specific regex
pattern to extract our values. Instead we need to rely on qpcr decorators
to help guide the parsing algorithm. To this end, we have the
@qpcr:group
decorator, which has to be placed in the cell immediately above the first replicate column of each group. Sounds too abstract? Check out the the Getting Started
notebook again and look at the example there (it’s actually not complicated at all).
2.1 - Using the BigTableReader
We will actually now read a datafile that is not one of our normal example files. It’s a file from Yang et al. (2018) who analysed the expression of micro-RNAs (miRNAs) in alvular interstitial cells of dogs suffering from Canine myxomatous mitral valve disease. The architecture of the datafile they uploaded to Zenodo is a “horizontal” big table containing assays of miRNAs on separate rows and replicate Ct values from “normal” (healthy), “mildly” ill, or “severely” ill dogs as three replicate groups of five replicates each.
Regrettably, they did not include the Ct values from the normaliser assays they used, so we will be unable to to anything useful beyond “file reading”.
Please, note that the datafile we are reading is actually a processed version of the original file they uploaded that merged the separate sheets from the excel file into a single-sheet big table. Also, duplicate assays were dropped for the purpose of reading. Currently, the
BigTableReader
in"horizontal"
parsing mode only supports uniquely labelled assays (so if there are duplicate assays they must be differently labelled!).
[16]:
file = "./Example Data/Big Table Files/horizontal_bigtable_decorated.csv"
# we need to reset the reader, because
# it currently still stores the assays
# from the vertical big table file...
reader.clear()
assays, normalisers = reader.pipe(
filename = file,
# we specify that it's a horizontal table
kind = "horizontal",
# we **have to** specify the
# number of replicates per group
replicates = 5,
# we also specify the group names (because they
# will not be inferrable due to the different
# column names of the replicates)
names = ["normal", "mild", "severe"],
# we must also specify which column specifies the assays
# NOTE: in this mode this is handled by the `id_col` argument!
id_col = "Target"
)
print( f"Loaded assays: {len(assays)}" )
Loaded assays: 286
And just like this we have read 286 new qpcr.Assay
s that we can now either process through native qpcr
methods, or we can use the data we just assembled with external tools such as pandas
, scipy
, or numpy
. After all, there is no need to restrict ourselves to using qpcr
only for its developed “core business” of Delta-Delta-Ct analysis. We can use the data qpcr
reads for any kind of analysis we want.