Filtering the data

This notebook gives an example of how to pre-filter your raw Ct data. The qpcr module contains the qpcr.Filters submodule which defines two types of filters: the RangeFilter and the IQRFilter. Both of these vet input data and remove any (probably) faulty meaurements. This keeps your data clean and ensures the datafiles themselves are still left intact.

Experimental background

The corresponding experimental setup was as follows: Levels of Nonsense-mediated mRNA decay (NMD) sensitive (nmd) and insensitive (prot) transcript isoforms of HNRNPL and SRSF11 were measured by qPCR. As normalisers both 28S rRNA and Actin transcript levels were measured. The replicates are biological triplicates and technical douplicates. All measurements from the same qPCR sample were merged into hexaplicates (6 replicates). This was done in two separate HeLa cell lines (one with a specific gene knockout (KO), and one without (WT)), which were both treated to a plasmid-mediated rescue (+) or not (-), leading to four experimental conditions:

cell line \ condition

rescue

no rescue

knockout

KO+

KO-

wildtype

WT+

WT-

Simple Analysis

In this example we will perform the same basic Delta-Delta Ct analysis, we did in the previous tutorials. However, this time we will add a RangeFilter to our analysis.

What the RangeFilter does is pretty self-explanatory from its name. It filters out any data point that is outside a given range around the median of a group of replicates (per default settings +/- 1 around the median, but both range and anchor can be manually adjusted).

The handy thing is that, just like Plotters, the BasicPlus pipeline also supports adding in Filters. However, we will also discuss how to manually use filters when building your pipeline.

[1]:
# import what we need
import qpcr
from qpcr.Pipes import BasicPlus
from qpcr.Filters import RangeFilter, IQRFilter

We will setup the data right at the start so that’s out of the way. We have discussed the way to provide data to either the classes of the main qpcr module or the qpcr.Pipes pipelines in the last tutorials. If any of these steps are unclear to you, please, check out the previous tutorials.

[2]:
# get our datafiles
normaliser_files = [
                        "./Example Data/28S.csv",
                        "./Example Data/actin.csv"
                ]

sample_files = [
                    "./Example Data/HNRNPL_nmd.csv",
                    "./Example Data/HNRNPL_prot.csv",
                    "./Example Data/SRSF11_nmd.csv",
                    "./Example Data/SRSF11_prot.csv",
            ]

# define our experimental parameters
reps = 6
group_names = ["WT-", "WT+", "KO-", "KO+"]

Tutorial 1 - Filters and the BasicPlus pipeline

1.1 Setting up the BasicPlus pipeline

We already discussed setting up a pipeline in the last tutorial in 2_pipeline_totorial.ipynb so we will not go over the details here again.

[3]:
# setup the pipeline
pipeline = BasicPlus()

1.3 Adding the RangeFilter

That’s the moment we have been waiting for. We can add the RangeFilter just like we just added the PreviewResults figure class. The procedure is absolutely identical.

Filters can not only filter out faulty data points but they also generate a report about their work, recording which entries they removed, and under which settings they were run. Also, they support visualising the data that passed through them before and after filtering using BoxPlots. The filter we define below will generate a static boxplot summary of its filtering-activity.

[4]:
# setup the filter
filter = RangeFilter()
pipeline.add_filters(filter)

1.4 Feeding our data

Next we feed our pipeline with our experimental data.

[5]:
pipeline.replicates(reps)
pipeline.names(group_names)
pipeline.add_assays(sample_files)
pipeline.add_normalisers(normaliser_files)

1.5 Running everything

Now that we are all set up, we are ready to go and run the pipeline.

[6]:
# run the pipeline
pipeline.run()

# and get results
results = pipeline.results()
fig = results.preview()
../_images/tutorials_3_filters_12_0.png
../_images/tutorials_3_filters_12_1.png

Notice how our filtering did not actually remove any data points. This is because even though some data points are considered outliers when visualising the data in box plots, they still all lie within the static inclusion range. To remove these data points, we can either define a narrower inclusion range using the filter’s set_lim() method, or choose the IQRFilter instead of the static RangeFilter.

Tutorial 2 - Manual use of Filters

We have already seen how to manually implement an analysis pipeline in the first tutorial in 1_manual_tutorial.ipynb.

The filter is simply one more step we can implement after reading the data using a qpcr.DataReader and before passing the data on to a qpcr.Analyser. Conveniently, just like the qpcr.Analyser, the filters also are equipped with a pipe() method that takes in a qpcr.Assay object and returns the same containing only entries that passed the filter.

2.1 Setting up the pipeline manually

We have discussed these steps in the first tutorial, so all clear here :-)

[7]:
# setup the pipeline, starting with the DataReader
reader = qpcr.DataReader()
analyser = qpcr.Analyser()
normaliser = qpcr.Normaliser()

2.2 Setting up the IQRFilter

Now this time we actually use the IQRFilter instead of the RangeFilter. Also, we want to save reports about the filtering and the BoxPlots. To do this, we must specify a directory where we want the reports to be saved to using the report() method. When using the BasicPlus pipeline the report location is conveniently handled for us automatically.

[8]:
iqrfilter = IQRFilter()
iqrfilter.report("Example Results")

2.3 Assembling the pipeline and including the Filter

Now we are ready to assemble our pipeline again. We can simply reuse the setup from the first tutorial and add the filter between the DataReader’s read()-step and the Analyser’s pipe()-step.

[9]:
# now feed in our data

# we first deal with the normalisers 28S and actin
normalisers = reader.read(normaliser_files, replicates = reps, names = group_names)

# here we filter our data
normalisers = iqrfilter.pipe(normalisers)

# and now we compute Delta-Ct
normalisers = analyser.pipe(normalisers)

# and we do the same for our assays-of-interest of HNRPL and SRSF11
assays = reader.read(sample_files, replicates = reps, names = group_names)
assays = iqrfilter.pipe(assays)
assays = analyser.pipe(assays)

2.4 Generate the Filtering Summary

The Boxplots from filtering must be made after all the filtering is done (otherwise it wouldn’t be much of a summary). Therefore, we can call the filter’s plot() method at basically any point after we have finished with the last qpcr.Analyser-step. To do this all we need to call is:

[10]:
fig = iqrfilter.plot()
../_images/tutorials_3_filters_22_0.png

Note how this time the two figures actually show differences between pre- and post-filtered datasets. That is because the IQRFilter is not bound by the static +-1 range of the RangeFilter but will dynamically filter out according to the spread of the data.

Then why are there “new outliers” in the Post-Filter Summary?? Well, we assemble a new Boxplot from the reduced data here, so the median and first and third quartiles will be re-calculated which may result in “new outliers” as the reference parameters have changed. However, this is just from the plotting method.

2.5 Finishing up

From here on it’s just business as usual to get the data from the analysis, just as in the first tutorial. Since this has nothing to do with the Filters per se, we will stop at this point.

You are now able to use the qpcr.Filters in both the pre-implemented BasicPlus pipeline and in your own manually assembled pipeline. Good job!