$t$-tests are among the most common statistical tests performed in world.
This notebook focuses on the practicalities of performing $t$-tests in Python.
For information about the $t$-test itself, I recommend reading Laerd Statistics's Independent t-test using SPSS Statistics.
# Efficient numerical arrays.
import numpy as np
# Data frames.
import pandas as pd
# Alternative statistics package.
import statsmodels.stats.weightstats as stat
# Mains statistics package.
import scipy.stats as ss
# Plotting.
import matplotlib.pyplot as plt
# Fancier plotting.
import seaborn as sns
# Better sized plots.
plt.rcParams['figure.figsize'] = (12, 8)
# Nicer colours and styles for plots.
plt.style.use("fivethirtyeight")
We can create fake data sets with specific properties to investigate numerical methods.
# Parameters for two different lists of numbers.
m_a, s_a, m_b, s_b = 1.0, 0.4, 2.0, 0.4
# Sample size.
N = 40
# Create two lists of numbers based on bell-shaped probability curves.
a = np.random.normal(loc=m_a, scale=s_a, size=N)
b = np.random.normal(loc=m_b, scale=s_b, size=N)
# Stick both samples in one data frame.
df = pd.DataFrame({'Category': ['A'] * len(a) + ['B'] * len(b), 'Value': np.hstack([a,b])})
# We can look directly at the list of numbers, but it's not very illuminating.
df
# One type of plot available in seaborn.
sns.catplot(x='Category', y='Value', jitter=False, data=df);
Running a t-test in Python is done with a single function call. You can use scipy or statsmodels, amongst others.
# The scipy.stats version.
t_ss, p_ss = ss.ttest_ind(a, b)
print(f"P_scipy: {p_ss:0.2f}")
# The statsmodels version.
t_sm, p_sm, d_sm = stat.ttest_ind(a, b)
print(f"P_statsmodels: {p_sm:0.2f}")
$t$-tests perform calculations on samples from two populations to test whether the populations are likely similar.
In the real world, we only see the samples and we cannot see the populations.
# Let's create a plot with the following x values.
x = np.linspace(-2.0, 4.0, 1000)
# We'll have plots of two different populations on one set of axes.
y_a = ss.norm.pdf(x, m_a, s_a)
y_b = ss.norm.pdf(x, m_b, s_b)
fig, ax = plt.subplots(figsize=(10,6))
ax.plot(x, y_a)
ax.plot(x, y_b)
plt.show()
The critical value is used to make a decision regarding the calculation of the $t$ statistic from the samples.
If the probability of seeing such a $t$ value given the hypothesis that there is no difference between the means, then data is suggesting that you should reject that hypothesis.
# This code just builds the plot below.
x_t = np.linspace(-4.0, 4.0, 1000)
t = ss.t.pdf(x_t, d_sm)
tf = pd.DataFrame({'x': x_t, 't': t})
tcrit = abs(ss.t.ppf(0.025, d_sm))
one = tf[tf['x'] >= tcrit]
two = tf[tf['x'] <= -tcrit]
fig, ax = plt.subplots(figsize=(10,6))
ax.plot(x_t, t)
ax.fill_between(one['x'], one['t'], 0, facecolor="red")
ax.fill_between(two['x'], two['t'], 0, facecolor="red")
plt.show()
# Let's run 10000 t-tests where the means are equal.
# We should make the wrong decision (reject the hypothesis) (100 * critical) percent of the time.
trials = 10000
N = 100
m_a, m_b, s = 2.0, 2.0, 0.3
rejects = 0
critical = 0.05
for i in range(trials):
a = np.random.normal(loc=m_a, scale=s, size=N)
b = np.random.normal(loc=m_b, scale=s, size=N)
if ss.ttest_ind(a, b)[1] <= critical:
rejects = rejects + 1
typei = 100.0 * (rejects / trials)
print(f"{typei:0.2f}%")
The chance of a false negative is harder to quantify - it depends on how close the means are.
trials = 10000
N = 100
m_a, m_b, s = 2.0, 2.1, 0.3
dont = 0
for i in range(trials):
a = np.random.normal(loc=m_a, scale=s, size=N)
b = np.random.normal(loc=m_b, scale=s, size=N)
if ss.ttest_ind(a, b)[1] > 0.05:
dont = dont + 1
typeii = 100.0 * (dont / trials)
print(f"{typeii:0.2f}%")
Here we try a slightly different $t$ test - one based on repeated measures.
References for this section:
Vincent Arel-Bundock's R datasets list
dfsleep = pd.read_csv("https://vincentarelbundock.github.io/Rdatasets/csv/datasets/sleep.csv")
dfsleep
drugA = dfsleep[dfsleep["group"] == 1]
drugA = drugA.sort_values("ID")
drugA = drugA["extra"].to_numpy()
drugA
drugB = dfsleep[dfsleep["group"] == 2]
drugB = drugB.sort_values("ID")
drugB = drugB["extra"].to_numpy()
drugB
ss.ttest_rel(drugA, drugB)
ss.ttest_1samp(drugB - drugA, 0)
stat.DescrStatsW(drugB - drugA).ttest_mean(0)
# Set parameters for two populations.
popA = {'m': 1.6, 's': 0.1}
popB = {'m': 1.8, 's': 0.1}
# Create two samples, one from each population.
sampA = np.random.normal(popA['m'], popA['s'], 100)
sampB = np.random.normal(popB['m'], popB['s'], 100)
# x values for plotting.
x = np.linspace(1.25, 2.25, 1000)
# The probability density functions (PDFs) for the two populations.
pdfA = ss.norm.pdf(x, popA['m'], popA['s'])
pdfB = ss.norm.pdf(x, popB['m'], popB['s'])
# Plot the population PDFs as shaded regions.
plt.fill_between(x, pdfA, color='g', alpha=0.25, label="Population A")
plt.fill_between(x, pdfB, color='b', alpha=0.25, label="Population B")
# Plot histograms of the two samples.
plt.hist(sampA, density=True, color='g', alpha=0.25, label="Sample A")
plt.hist(sampB, density=True, color='b', alpha=0.25, label="Sample B")
# Display a legend.
plt.legend()
plt.show()
Suppose we want to compare three groups. Can three $t$ tests be run in parallel?
# Size of each sample.
N = 100
# Create three samples.
sampA = np.random.normal(1.0, 0.2, N)
sampB = np.random.normal(1.0, 0.2, N)
sampC = np.random.normal(2.0, 0.2, N)
# Put samples in a single data frame.
sample = ['A'] * N + ['B'] * N + ['C'] * N
values = np.hstack([sampA, sampB, sampC])
dfsamps = pd.DataFrame({'Sample': sample, 'Value': values})
# Visualise samples.
sns.catplot(x='Sample', y='Value', jitter=False, data=dfsamps);
# t-Tests
t_AB, p_AB = ss.ttest_ind(sampA, sampB)
t_AC, p_AC = ss.ttest_ind(sampA, sampC)
t_BC, p_BC = ss.ttest_ind(sampB, sampC)
print(f"p_AB: {p_AB:.2f}\tp_AC: {p_AC:.2f}\tp_BC: {p_BC:.2f}")
# Let's run 1000 tests, remembering our Type I errors.
falsepos = 0
for i in range(1000):
A = np.random.normal(1.0, 0.2, N)
B = np.random.normal(1.0, 0.2, N)
C = np.random.normal(1.0, 0.2, N)
t_AB, p_AB = ss.ttest_ind(A, B)
t_AC, p_AC = ss.ttest_ind(A, C)
t_BC, p_BC = ss.ttest_ind(A, C)
if p_AB <= 0.05 or p_AC <= 0.05 or p_BC <= 0.05:
falsepos = falsepos + 1
print(f"False positive rate: {falsepos / 10}%")
F, P = ss.f_oneway(sampA, sampB, sampC)
print(f"{P:.2f}")