Performing statistical tests

This notebook gives an example how to perform statistical tests on the computed results using the qpcr.stats module.

Experimental background

The corresponding experimental setup was as follows: Levels of Nonsense-mediated mRNA decay (NMD) sensitive (nmd) and insensitive (prot) transcript isoforms of HNRNPL and SRSF11 were measured by qPCR. As normalisers both 28S rRNA and Actin transcript levels were measured. The replicates are biological triplicates and technical douplicates. All measurements from the same qPCR sample were merged into hexaplicates (6 replicates). This was done in two separate HeLa cell lines (one with a specific gene knockout (KO), and one without (WT)), which were both treated to a plasmid-mediated rescue (+) or not (-), leading to four experimental conditions:

cell line \ condition

rescue

no rescue

knockout

KO+

KO-

wildtype

WT+

WT-

[1]:
# import qpcr and the stats module
import qpcr
import qpcr.stats as qstats

1 - Statistical tests

Using qpcr we can directly perform statistical tests on our data. The integrated qpcr.stats module offers multiple T-Tests to perform pairwise comparisons and ANOVA. Both analyses are offered either assay-wise or group-wise. In assay-wise mode comparisons we compare the different groups within each assay separately, whereas in group-wise mode we compare the different assays within each group separately.

The tests are conveniently implemented through a functional API, but originate from the dedicated qpcr.stats.Evaluator class - however, we most likely will never have to manually set this up ourselves. The functions (and the Evaluator) work directly with a qpcr.Results object and return a Comparison object (originally so named because of the multiple comparisons performed by T-Tests). Conveniently, the Comparisons are also directly attached to the Results object and are automatically integrated into the preview figures.

2 - Evaluating our data with T-Tests

2.1 Standard Workflow

We perform our standard workflow of loading data, followed by \(\Delta \Delta Ct\) analysis.

[2]:
# get our datafiles
normaliser_files = [
                    "./Example Data/28S.csv",
                    "./Example Data/actin.csv"
                ]

assay_files = [
                "./Example Data/HNRNPL_nmd.csv",
                "./Example Data/HNRNPL_prot.csv",
                "./Example Data/SRSF11_nmd.csv",
                "./Example Data/SRSF11_prot.csv",
            ]

# define our experimental parameters
reps = 6
group_names = ["WT-", "WT+", "KO-", "KO+"]

[3]:
# now read and analyse
all_assays = qpcr.read([assay_files, normaliser_files], replicates=reps, names=group_names)
all_assays = qpcr.delta_ct(all_assays)
results = qpcr.normalise(all_assays[0], all_assays[1])
results.stats()
[3]:
group group_name assay n mean stdev median IQR_(0.25, 0.75) CI_0.95
0 0 WT- HNRNPL_nmd_rel_28S+actin 6 1.050056 0.029452 1.050267 0.038511 [1.0161980755868047, 1.0839147340669306]
4 1 WT+ HNRNPL_nmd_rel_28S+actin 6 6.052860 0.890336 6.366251 1.516539 [5.029330209330722, 7.076390063511062]
8 2 KO- HNRNPL_nmd_rel_28S+actin 6 9.566500 0.513593 9.614924 0.734027 [8.976073861884581, 10.156926829088912]
12 3 KO+ HNRNPL_nmd_rel_28S+actin 6 16.940332 1.126687 16.964906 1.096016 [15.645093462670095, 18.235569627233723]
1 0 WT- HNRNPL_prot_rel_28S+actin 6 1.025239 0.040091 1.006244 0.028966 [0.9791503057437683, 1.0713284193830188]
5 1 WT+ HNRNPL_prot_rel_28S+actin 6 0.913758 0.050860 0.917271 0.042987 [0.8552897342904139, 0.9722270525874591]
9 2 KO- HNRNPL_prot_rel_28S+actin 6 0.856658 0.029906 0.862768 0.028070 [0.8222776478939225, 0.891038325780611]
13 3 KO+ HNRNPL_prot_rel_28S+actin 6 0.925665 0.057855 0.931949 0.073664 [0.859154910656904, 0.9921743211393722]
2 0 WT- SRSF11_nmd_rel_28S+actin 6 0.885403 0.102865 0.857620 0.172183 [0.7671489539924317, 1.0036564706179987]
6 1 WT+ SRSF11_nmd_rel_28S+actin 6 3.374373 0.638138 3.644576 0.813222 [2.6407697994029142, 4.107976488500923]
10 2 KO- SRSF11_nmd_rel_28S+actin 6 5.286670 0.347279 5.284115 0.661335 [4.8874376127697445, 5.685901849372261]
14 3 KO+ SRSF11_nmd_rel_28S+actin 6 7.604066 0.580553 7.770230 0.771464 [6.936662966465831, 8.271469831244389]
3 0 WT- SRSF11_prot_rel_28S+actin 6 1.009713 0.031999 0.997147 0.053565 [0.9729276202797885, 1.0464986201225999]
7 1 WT+ SRSF11_prot_rel_28S+actin 6 1.197983 0.076915 1.190168 0.126247 [1.1095617597701253, 1.2864044289031957]
11 2 KO- SRSF11_prot_rel_28S+actin 6 0.887404 0.060800 0.894811 0.102612 [0.8175079293860618, 0.9572992427668053]
15 3 KO+ SRSF11_prot_rel_28S+actin 6 1.136573 0.139244 1.096490 0.174390 [0.9764976866134043, 1.2966473268737058]

2.2 Performing T-tests on our results

Now we perform multiple T-tests to compare the different groups within our assays (i.e. “assaywise t-tests”). We can do so using the qpcr.stats.assaywise_ttests function.

[4]:
ttests = qstats.assaywise_ttests(results)
ttests.to_df() # get the test results as a pandas dataframe
[4]:
a b pval pval_adj stat effect_size id
0 KO+ KO+ NaN NaN NaN NaN HNRNPL_nmd_rel_28S+actin
1 KO+ KO- NaN NaN NaN NaN HNRNPL_nmd_rel_28S+actin
2 KO+ WT+ NaN NaN NaN NaN HNRNPL_nmd_rel_28S+actin
3 KO+ WT- NaN NaN NaN NaN HNRNPL_nmd_rel_28S+actin
4 KO- KO+ 1.091626e-07 1.637438e-07 -13.316149 7.373831 HNRNPL_nmd_rel_28S+actin
... ... ... ... ... ... ... ...
11 WT+ WT- NaN NaN NaN NaN SRSF11_prot_rel_28S+actin
12 WT- KO+ 7.518940e-02 9.022729e-02 -1.985438 0.126859 SRSF11_prot_rel_28S+actin
13 WT- KO- 2.598427e-03 5.196853e-03 3.980604 0.122310 SRSF11_prot_rel_28S+actin
14 WT- WT+ 4.966919e-04 1.490076e-03 -5.053499 0.188270 SRSF11_prot_rel_28S+actin
15 WT- WT- NaN NaN NaN NaN SRSF11_prot_rel_28S+actin

64 rows × 7 columns

And just like this we get a dataframe. You may notice the many NaN entries. This is because by default the tests only store the values of one comparison (i.e. “A vs B”, but not also “B vs A”, since they would be identical anyway). However, if you prefer a a “full picture”, you can use the make_symmetric method of the comparison object to get a “full” dataframe.

[5]:
ttests.make_symmetric()
ttests.to_df()
[5]:
a b pval pval_adj stat effect_size id
0 KO+ KO+ NaN NaN NaN NaN HNRNPL_nmd_rel_28S+actin
1 KO+ KO- 1.091626e-07 1.637438e-07 -13.316149 7.373831 HNRNPL_nmd_rel_28S+actin
2 KO+ WT+ 1.072654e-08 2.145308e-08 -16.953324 10.887471 HNRNPL_nmd_rel_28S+actin
3 KO+ WT- 2.423943e-11 7.271829e-11 -31.525713 15.890275 HNRNPL_nmd_rel_28S+actin
4 KO- KO+ 1.091626e-07 1.637438e-07 -13.316149 7.373831 HNRNPL_nmd_rel_28S+actin
... ... ... ... ... ... ... ...
11 WT+ WT- 4.966919e-04 1.490076e-03 -5.053499 0.188270 SRSF11_prot_rel_28S+actin
12 WT- KO+ 7.518940e-02 9.022729e-02 -1.985438 0.126859 SRSF11_prot_rel_28S+actin
13 WT- KO- 2.598427e-03 5.196853e-03 3.980604 0.122310 SRSF11_prot_rel_28S+actin
14 WT- WT+ 4.966919e-04 1.490076e-03 -5.053499 0.188270 SRSF11_prot_rel_28S+actin
15 WT- WT- NaN NaN NaN NaN SRSF11_prot_rel_28S+actin

64 rows × 7 columns

2.3 Selecting pairs to compare

Most often we do not actually want or need to compare all possible pairs but have a specific question in mind. We can provide pairs to compare specifically using the pairs argument. This argument accepts lists of group identifiers (or assay ids) or tuples thereof to selectively perform statistical tests.

[6]:
# compare only WT- and KO-
ttests = qstats.assaywise_ttests(results, pairs=["WT-", "KO-"])
ttests.to_df()
[6]:
a b pval pval_adj stat effect_size id
0 KO+ KO+ NaN NaN NaN NaN HNRNPL_nmd_rel_28S+actin
1 KO+ KO- NaN NaN NaN NaN HNRNPL_nmd_rel_28S+actin
2 KO+ WT+ NaN NaN NaN NaN HNRNPL_nmd_rel_28S+actin
3 KO+ WT- NaN NaN NaN NaN HNRNPL_nmd_rel_28S+actin
4 KO- KO+ NaN NaN NaN NaN HNRNPL_nmd_rel_28S+actin
... ... ... ... ... ... ... ...
11 WT+ WT- NaN NaN NaN NaN SRSF11_prot_rel_28S+actin
12 WT- KO+ NaN NaN NaN NaN SRSF11_prot_rel_28S+actin
13 WT- KO- 0.002598 0.002598 3.980604 0.12231 SRSF11_prot_rel_28S+actin
14 WT- WT+ NaN NaN NaN NaN SRSF11_prot_rel_28S+actin
15 WT- WT- NaN NaN NaN NaN SRSF11_prot_rel_28S+actin

64 rows × 7 columns

Or if we wish to compare only wildtype and knockout + vs - we would do:

[7]:
ttests = qstats.assaywise_ttests(results, pairs=[("WT-", "WT+"), ("KO-", "KO+")])
ttests.to_df()
[7]:
a b pval pval_adj stat effect_size id
0 KO+ KO+ NaN NaN NaN NaN HNRNPL_nmd_rel_28S+actin
1 KO+ KO- NaN NaN NaN NaN HNRNPL_nmd_rel_28S+actin
2 KO+ WT+ NaN NaN NaN NaN HNRNPL_nmd_rel_28S+actin
3 KO+ WT- NaN NaN NaN NaN HNRNPL_nmd_rel_28S+actin
4 KO- KO+ 1.091626e-07 1.903709e-07 -13.316149 7.373831 HNRNPL_nmd_rel_28S+actin
... ... ... ... ... ... ... ...
11 WT+ WT- NaN NaN NaN NaN SRSF11_prot_rel_28S+actin
12 WT- KO+ NaN NaN NaN NaN SRSF11_prot_rel_28S+actin
13 WT- KO- NaN NaN NaN NaN SRSF11_prot_rel_28S+actin
14 WT- WT+ 4.966919e-04 9.933838e-04 -5.053499 0.188270 SRSF11_prot_rel_28S+actin
15 WT- WT- NaN NaN NaN NaN SRSF11_prot_rel_28S+actin

64 rows × 7 columns

2.4 Visualising statistical comparisons

Conveniently, the results preview plotters automatically integrate the statistical tests into their figures. We can simply plot the results as usual and we will see the significance bars in the right places.

Note:

This visualisation is only available in static figures!

[9]:
fig = results.preview()
../_images/tutorials_10_statistical_tests_15_0.png

3.5 Styling the visualisation

The preview accepts some kwargs to further control the styling of significance bars. For once, using annotate_pvals we can switch the labelling on and off. Then, using pval_kws we can provide a dictionary with styling parameters.

[12]:
fig = results.preview(figsize=(8,5), pval_kws=dict(style="*"))
../_images/tutorials_10_statistical_tests_17_0.png

3 - Evaluating our data with ANOVA

Instead of comparing individual pairs within our data, we can also check if there is overall variance across all groups per assay. To do this, we can use assaywise_anova instead of assaywise_ttests. The resulting table summarizes the p-value in each analysed assay.

[13]:
anova = qstats.assaywise_anova(results)
anova.to_df()
[13]:
id pval stat
0 HNRNPL_nmd_rel_28S+actin 7.798242e-18 383.446724
0 HNRNPL_prot_rel_28S+actin 1.239437e-04 11.642482
0 SRSF11_nmd_rel_28S+actin 9.027072e-15 185.855714
0 SRSF11_prot_rel_28S+actin 7.159838e-05 12.701184

Of course, if in your data the groups are are more meaningful than assays, then qpcr also offers the same functions in group-wise mode through groupwise_ttests and groupwise_anova.

Alright, with this we have reached the end of this tutorial on the statistical tests that can be directly performed using qpcr - congrats!