%matplotlib inline
import scipy.stats
import pandas
import matplotlib
import matplotlib.pyplot as plt
import seaborn
seaborn.set_context('notebook', font_scale=2)
First, we read in a table of race results manually compiled from the internet.
results = pandas.read_csv('race_results.csv')
print(results.to_string(index=False))
year race Hugh Jesse 2013 Shore Run 22.22 19.42 2015 Dawg Dash 20.17 19.27 2017 Shore Run 20.37 19.93
Next, we look at how the race results have changed over time for each runner, drawing a dashed line at the critical threshold of 20 minutes.
results_melt = pandas.melt(results, id_vars=['year'], value_vars=['Hugh', 'Jesse'],
var_name='runner', value_name='time')
seaborn.pointplot(x='year', y='time', data=results_melt, hue='runner', fit_reg=False)
plt.ylabel('time (minutes)')
plt.axhline(y=20, color='black', linestyle='--', linewidth=1)
plt.show()
We perform statistical tests to determine if the runners have significantly different times.
First, we plot the distributions for the two runners.
seaborn.stripplot(data=results_melt, x='runner', y='time', s=10)
plt.ylabel('time (minutes)')
plt.show()
Then we test if they are significantly different using the non-parametric Mann-Whitney test.
scipy.stats.mannwhitneyu(results['Hugh'], results['Jesse'])
MannwhitneyuResult(statistic=0.0, pvalue=0.04042779918502612)
Therefore, we can conclude with P < 0.05 that Jesse's times are significantly faster than Hugh's.