Performing statistical tests

This notebook gives an example how to perform statistical tests on the computed results using the qpcr.stats module.

Experimental background

The corresponding experimental setup was as follows: Levels of Nonsense-mediated mRNA decay (NMD) sensitive (nmd) and insensitive (prot) transcript isoforms of HNRNPL and SRSF11 were measured by qPCR. As normalisers both 28S rRNA and Actin transcript levels were measured. The replicates are biological triplicates and technical douplicates. All measurements from the same qPCR sample were merged into hexaplicates (6 replicates). This was done in two separate HeLa cell lines (one with a specific gene knockout (KO), and one without (WT)), which were both treated to a plasmid-mediated rescue (+) or not (-), leading to four experimental conditions:

cell line \ condition	rescue	no rescue
knockout	KO+	KO-
wildtype	WT+	WT-

[1]:

# import qpcr and the stats module
import qpcr
import qpcr.stats as qstats

1 - Statistical tests

Using qpcr we can directly perform statistical tests on our data. The integrated qpcr.stats module offers multiple T-Tests to perform pairwise comparisons and ANOVA. Both analyses are offered either assay-wise or group-wise. In assay-wise mode comparisons we compare the different groups within each assay separately, whereas in group-wise mode we compare the different assays within each group separately.

The tests are conveniently implemented through a functional API, but originate from the dedicated qpcr.stats.Evaluator class - however, we most likely will never have to manually set this up ourselves. The functions (and the Evaluator) work directly with a qpcr.Results object and return a Comparison object (originally so named because of the multiple comparisons performed by T-Tests). Conveniently, the Comparisons are also directly attached to the Results object and are automatically integrated into the preview figures.

2 - Evaluating our data with T-Tests

2.1 Standard Workflow

We perform our standard workflow of loading data, followed by \(\Delta \Delta Ct\) analysis.

[2]:

# get our datafiles
normaliser_files = [
                    "./Example Data/28S.csv",
                    "./Example Data/actin.csv"
                ]

assay_files = [
                "./Example Data/HNRNPL_nmd.csv",
                "./Example Data/HNRNPL_prot.csv",
                "./Example Data/SRSF11_nmd.csv",
                "./Example Data/SRSF11_prot.csv",
            ]

# define our experimental parameters
reps = 6
group_names = ["WT-", "WT+", "KO-", "KO+"]

[3]:

# now read and analyse
all_assays = qpcr.read([assay_files, normaliser_files], replicates=reps, names=group_names)
all_assays = qpcr.delta_ct(all_assays)
results = qpcr.normalise(all_assays[0], all_assays[1])
results.stats()

[3]:

	group	group_name	assay	n	mean	stdev	median	IQR_(0.25, 0.75)	CI_0.95
0	0	WT-	HNRNPL_nmd_rel_28S+actin	6	1.050056	0.029452	1.050267	0.038511	[1.0161980755868047, 1.0839147340669306]
4	1	WT+	HNRNPL_nmd_rel_28S+actin	6	6.052860	0.890336	6.366251	1.516539	[5.029330209330722, 7.076390063511062]
8	2	KO-	HNRNPL_nmd_rel_28S+actin	6	9.566500	0.513593	9.614924	0.734027	[8.976073861884581, 10.156926829088912]
12	3	KO+	HNRNPL_nmd_rel_28S+actin	6	16.940332	1.126687	16.964906	1.096016	[15.645093462670095, 18.235569627233723]
1	0	WT-	HNRNPL_prot_rel_28S+actin	6	1.025239	0.040091	1.006244	0.028966	[0.9791503057437683, 1.0713284193830188]
5	1	WT+	HNRNPL_prot_rel_28S+actin	6	0.913758	0.050860	0.917271	0.042987	[0.8552897342904139, 0.9722270525874591]
9	2	KO-	HNRNPL_prot_rel_28S+actin	6	0.856658	0.029906	0.862768	0.028070	[0.8222776478939225, 0.891038325780611]
13	3	KO+	HNRNPL_prot_rel_28S+actin	6	0.925665	0.057855	0.931949	0.073664	[0.859154910656904, 0.9921743211393722]
2	0	WT-	SRSF11_nmd_rel_28S+actin	6	0.885403	0.102865	0.857620	0.172183	[0.7671489539924317, 1.0036564706179987]
6	1	WT+	SRSF11_nmd_rel_28S+actin	6	3.374373	0.638138	3.644576	0.813222	[2.6407697994029142, 4.107976488500923]
10	2	KO-	SRSF11_nmd_rel_28S+actin	6	5.286670	0.347279	5.284115	0.661335	[4.8874376127697445, 5.685901849372261]
14	3	KO+	SRSF11_nmd_rel_28S+actin	6	7.604066	0.580553	7.770230	0.771464	[6.936662966465831, 8.271469831244389]
3	0	WT-	SRSF11_prot_rel_28S+actin	6	1.009713	0.031999	0.997147	0.053565	[0.9729276202797885, 1.0464986201225999]
7	1	WT+	SRSF11_prot_rel_28S+actin	6	1.197983	0.076915	1.190168	0.126247	[1.1095617597701253, 1.2864044289031957]
11	2	KO-	SRSF11_prot_rel_28S+actin	6	0.887404	0.060800	0.894811	0.102612	[0.8175079293860618, 0.9572992427668053]
15	3	KO+	SRSF11_prot_rel_28S+actin	6	1.136573	0.139244	1.096490	0.174390	[0.9764976866134043, 1.2966473268737058]

2.2 Performing T-tests on our results

Now we perform multiple T-tests to compare the different groups within our assays (i.e. “assaywise t-tests”). We can do so using the qpcr.stats.assaywise_ttests function.

[4]:

ttests = qstats.assaywise_ttests(results)
ttests.to_df() # get the test results as a pandas dataframe

[4]:

	a	b	pval	pval_adj	stat	effect_size	id
0	KO+	KO+	NaN	NaN	NaN	NaN	HNRNPL_nmd_rel_28S+actin
1	KO+	KO-	NaN	NaN	NaN	NaN	HNRNPL_nmd_rel_28S+actin
2	KO+	WT+	NaN	NaN	NaN	NaN	HNRNPL_nmd_rel_28S+actin
3	KO+	WT-	NaN	NaN	NaN	NaN	HNRNPL_nmd_rel_28S+actin
4	KO-	KO+	1.091626e-07	1.637438e-07	-13.316149	7.373831	HNRNPL_nmd_rel_28S+actin
...	...	...	...	...	...	...	...
11	WT+	WT-	NaN	NaN	NaN	NaN	SRSF11_prot_rel_28S+actin
12	WT-	KO+	7.518940e-02	9.022729e-02	-1.985438	0.126859	SRSF11_prot_rel_28S+actin
13	WT-	KO-	2.598427e-03	5.196853e-03	3.980604	0.122310	SRSF11_prot_rel_28S+actin
14	WT-	WT+	4.966919e-04	1.490076e-03	-5.053499	0.188270	SRSF11_prot_rel_28S+actin
15	WT-	WT-	NaN	NaN	NaN	NaN	SRSF11_prot_rel_28S+actin

64 rows × 7 columns

And just like this we get a dataframe. You may notice the many NaN entries. This is because by default the tests only store the values of one comparison (i.e. “A vs B”, but not also “B vs A”, since they would be identical anyway). However, if you prefer a a “full picture”, you can use the make_symmetric method of the comparison object to get a “full” dataframe.

[5]:

ttests.make_symmetric()
ttests.to_df()

[5]:

	a	b	pval	pval_adj	stat	effect_size	id
0	KO+	KO+	NaN	NaN	NaN	NaN	HNRNPL_nmd_rel_28S+actin
1	KO+	KO-	1.091626e-07	1.637438e-07	-13.316149	7.373831	HNRNPL_nmd_rel_28S+actin
2	KO+	WT+	1.072654e-08	2.145308e-08	-16.953324	10.887471	HNRNPL_nmd_rel_28S+actin
3	KO+	WT-	2.423943e-11	7.271829e-11	-31.525713	15.890275	HNRNPL_nmd_rel_28S+actin
4	KO-	KO+	1.091626e-07	1.637438e-07	-13.316149	7.373831	HNRNPL_nmd_rel_28S+actin
...	...	...	...	...	...	...	...
11	WT+	WT-	4.966919e-04	1.490076e-03	-5.053499	0.188270	SRSF11_prot_rel_28S+actin
12	WT-	KO+	7.518940e-02	9.022729e-02	-1.985438	0.126859	SRSF11_prot_rel_28S+actin
13	WT-	KO-	2.598427e-03	5.196853e-03	3.980604	0.122310	SRSF11_prot_rel_28S+actin
14	WT-	WT+	4.966919e-04	1.490076e-03	-5.053499	0.188270	SRSF11_prot_rel_28S+actin
15	WT-	WT-	NaN	NaN	NaN	NaN	SRSF11_prot_rel_28S+actin

64 rows × 7 columns

2.3 Selecting pairs to compare

Most often we do not actually want or need to compare all possible pairs but have a specific question in mind. We can provide pairs to compare specifically using the pairs argument. This argument accepts lists of group identifiers (or assay ids) or tuples thereof to selectively perform statistical tests.

[6]:

# compare only WT- and KO-
ttests = qstats.assaywise_ttests(results, pairs=["WT-", "KO-"])
ttests.to_df()

[6]:

	a	b	pval	pval_adj	stat	effect_size	id
0	KO+	KO+	NaN	NaN	NaN	NaN	HNRNPL_nmd_rel_28S+actin
1	KO+	KO-	NaN	NaN	NaN	NaN	HNRNPL_nmd_rel_28S+actin
2	KO+	WT+	NaN	NaN	NaN	NaN	HNRNPL_nmd_rel_28S+actin
3	KO+	WT-	NaN	NaN	NaN	NaN	HNRNPL_nmd_rel_28S+actin
4	KO-	KO+	NaN	NaN	NaN	NaN	HNRNPL_nmd_rel_28S+actin
...	...	...	...	...	...	...	...
11	WT+	WT-	NaN	NaN	NaN	NaN	SRSF11_prot_rel_28S+actin
12	WT-	KO+	NaN	NaN	NaN	NaN	SRSF11_prot_rel_28S+actin
13	WT-	KO-	0.002598	0.002598	3.980604	0.12231	SRSF11_prot_rel_28S+actin
14	WT-	WT+	NaN	NaN	NaN	NaN	SRSF11_prot_rel_28S+actin
15	WT-	WT-	NaN	NaN	NaN	NaN	SRSF11_prot_rel_28S+actin

64 rows × 7 columns

Or if we wish to compare only wildtype and knockout + vs - we would do:

[7]:

ttests = qstats.assaywise_ttests(results, pairs=[("WT-", "WT+"), ("KO-", "KO+")])
ttests.to_df()

[7]:

	a	b	pval	pval_adj	stat	effect_size	id
0	KO+	KO+	NaN	NaN	NaN	NaN	HNRNPL_nmd_rel_28S+actin
1	KO+	KO-	NaN	NaN	NaN	NaN	HNRNPL_nmd_rel_28S+actin
2	KO+	WT+	NaN	NaN	NaN	NaN	HNRNPL_nmd_rel_28S+actin
3	KO+	WT-	NaN	NaN	NaN	NaN	HNRNPL_nmd_rel_28S+actin
4	KO-	KO+	1.091626e-07	1.903709e-07	-13.316149	7.373831	HNRNPL_nmd_rel_28S+actin
...	...	...	...	...	...	...	...
11	WT+	WT-	NaN	NaN	NaN	NaN	SRSF11_prot_rel_28S+actin
12	WT-	KO+	NaN	NaN	NaN	NaN	SRSF11_prot_rel_28S+actin
13	WT-	KO-	NaN	NaN	NaN	NaN	SRSF11_prot_rel_28S+actin
14	WT-	WT+	4.966919e-04	9.933838e-04	-5.053499	0.188270	SRSF11_prot_rel_28S+actin
15	WT-	WT-	NaN	NaN	NaN	NaN	SRSF11_prot_rel_28S+actin

64 rows × 7 columns

2.4 Visualising statistical comparisons

Conveniently, the results preview plotters automatically integrate the statistical tests into their figures. We can simply plot the results as usual and we will see the significance bars in the right places.

Note:

This visualisation is only available in static figures!

[9]:

fig = results.preview()

../_images/tutorials_10_statistical_tests_15_0.png

3.5 Styling the visualisation

The preview accepts some kwargs to further control the styling of significance bars. For once, using annotate_pvals we can switch the labelling on and off. Then, using pval_kws we can provide a dictionary with styling parameters.

[12]:

fig = results.preview(figsize=(8,5), pval_kws=dict(style="*"))

../_images/tutorials_10_statistical_tests_17_0.png

3 - Evaluating our data with ANOVA

Instead of comparing individual pairs within our data, we can also check if there is overall variance across all groups per assay. To do this, we can use assaywise_anova instead of assaywise_ttests. The resulting table summarizes the p-value in each analysed assay.

[13]:

anova = qstats.assaywise_anova(results)
anova.to_df()

[13]:

id	pval	stat
HNRNPL_nmd_rel_28S+actin	7.798242e-18	383.446724
HNRNPL_prot_rel_28S+actin	1.239437e-04	11.642482
SRSF11_nmd_rel_28S+actin	9.027072e-15	185.855714
SRSF11_prot_rel_28S+actin	7.159838e-05	12.701184

Of course, if in your data the groups are are more meaningful than assays, then qpcr also offers the same functions in group-wise mode through groupwise_ttests and groupwise_anova.

Alright, with this we have reached the end of this tutorial on the statistical tests that can be directly performed using qpcr - congrats!