Working with “Big Table” Datafiles

This notebook gives an example how to read “Big Table” datafiles using qpcr.Readers. It makes use of the provided example data in the Example Data directory.

Experimental background

The corresponding experimental setup was as follows: Levels of Nonsense-mediated mRNA decay (NMD) sensitive (nmd) and insensitive (prot) transcript isoforms of HNRNPL and SRSF11 were measured by qPCR. As normalisers both 28S rRNA and Actin transcript levels were measured. The replicates are biological triplicates and technical douplicates. All measurements from the same qPCR sample were merged into hexaplicates (6 replicates). This was done in two separate HeLa cell lines (one with a specific gene knockout (KO), and one without (WT)), which were both treated to a plasmid-mediated rescue (+) or not (-), leading to four experimental conditions:

cell line \ condition	rescue	no rescue
knockout	KO+	KO-
wildtype	WT+	WT-

[1]:

# import the qpcr module
import qpcr
from qpcr.Readers import BigTableReader

1 - Reading a “vertical” Big Table File

1.1 Setting up the `DataReader`

First we set up the qpcr.DataReader

[14]:

# our single-assay datafile
file = "./Example Data/Big Table Files/vertical_bigtable_decorated.csv"

# set up the reader
reader = qpcr.DataReader()

1.2 Specify how to read the file

This datafile is a “vertical” Big Table file. That means it stores assays above each other and has an assay, a sample (groups), and Ct column. In fact, this example datafile has nothing else in it, but reading a file that specifies other stuff would work just the same.

In order to read Big Table file we need to specify big_table = True for the DataReader or immediately use the qpcr.Readers.BigTableReader.

The `kind` parameter

One important parameter we have to specify is the architecture of the table. The table in our file is “vertical” so we need to specify kind = "vertical" in order to tell the Reader how to interpret the data it reads. The kind parameter tells the BigTableReader which parsing method to use to extract the individual assays. The other option here would correspondingly be kind = "horizontal".

Also, note how the filename already has “decorated” in it? That’s because the file already includes a @qpcr decorator column specifying which assays are “assays-of-interest” and which ones are normalisers.

Like this we can read the file and immediately pass it on to our analysis.

[11]:

assays, normalisers = reader.read(
                                        filename = file,

                                        # specify the big table
                                        big_table = True,

                                        # specify that the data is decorated
                                        decorator = True,

                                        # specify the kind of big table
                                        kind = "vertical",

                                        # specify which columns store
                                        # the relevant data
                                        assay_col = "Assay",
                                        id_col = "Name",
                                        ct_col = "Ct"
                        )
# which yields
print( assays, normalisers )

[Assay(id='HNRNPL nmd', eff=1.0, n=24), Assay(id='HNRNPL prot', eff=1.0, n=24)] [Assay(id='28S', eff=1.0, n=24)]

Of course, we can, again, also use the BigTableReader directly.

[15]:

reader = BigTableReader()

# read the file
assays, normalisers = reader.pipe(
                                        filename = file,

                                        # specify that the data is decorated
                                        decorator = True,

                                        # specify the kind of big table
                                        kind = "vertical",

                                        # specify which columns store
                                        # the relevant data
                                        assay_col = "Assay",
                                        id_col = "Name",
                                        ct_col = "Ct"
                                )

2 Reading a “horizontal” Big Table File

Reading a “horizontal” big table file is actually much more tricky than it’s vertical counterpart. Why’s that? Well, because in “horizontal” big table files the replicates are aligned in columns next to each other, and they usually have different column headers as well. That means we can no longer rely on a single column or even a specific regex pattern to extract our values. Instead we need to rely on qpcr decorators to help guide the parsing algorithm. To this end, we have the @qpcr:group decorator, which has to be placed in the cell immediately above the first replicate column of each group. Sounds too abstract? Check out the the Getting Started notebook again and look at the example there (it’s actually not complicated at all).

2.1 - Using the `BigTableReader`

We will actually now read a datafile that is not one of our normal example files. It’s a file from Yang et al. (2018) who analysed the expression of micro-RNAs (miRNAs) in alvular interstitial cells of dogs suffering from Canine myxomatous mitral valve disease. The architecture of the datafile they uploaded to Zenodo is a “horizontal” big table containing assays of miRNAs on separate rows and replicate Ct values from “normal” (healthy), “mildly” ill, or “severely” ill dogs as three replicate groups of five replicates each.

Regrettably, they did not include the Ct values from the normaliser assays they used, so we will be unable to to anything useful beyond “file reading”.

Please, note that the datafile we are reading is actually a processed version of the original file they uploaded that merged the separate sheets from the excel file into a single-sheet big table. Also, duplicate assays were dropped for the purpose of reading. Currently, the BigTableReader in "horizontal" parsing mode only supports uniquely labelled assays (so if there are duplicate assays they must be differently labelled!).

[16]:

file = "./Example Data/Big Table Files/horizontal_bigtable_decorated.csv"

# we need to reset the reader, because
# it currently still stores the assays
# from the vertical big table file...
reader.clear()

assays, normalisers = reader.pipe(
                                    filename = file,

                                    # we specify that it's a horizontal table
                                    kind = "horizontal",

                                    # we **have to** specify the
                                    # number of replicates per group
                                    replicates = 5,

                                    # we also specify the group names (because they
                                    # will not be inferrable due to the different
                                    # column names of the replicates)
                                    names = ["normal", "mild", "severe"],

                                    # we must also specify which column specifies the assays
                                    # NOTE: in this mode this is handled by the `id_col` argument!
                                    id_col = "Target"

                                )

print( f"Loaded assays: {len(assays)}" )

Loaded assays: 286

And just like this we have read 286 new qpcr.Assays that we can now either process through native qpcr methods, or we can use the data we just assembled with external tools such as pandas, scipy, or numpy. After all, there is no need to restrict ourselves to using qpcr only for its developed “core business” of Delta-Delta-Ct analysis. We can use the data qpcr reads for any kind of analysis we want.