Analysis of 5K race results from races run by both Jesse and Hugh¶

This notebook analyzes 5K races run by Jesse Bloom and Hugh Haddox.

import modules¶

In [1]:
%matplotlib inline
import scipy.stats
import pandas
import matplotlib
import matplotlib.pyplot as plt
import seaborn
seaborn.set_context('notebook', font_scale=2)


First, we read in a table of race results manually compiled from the internet.

In [2]:
results = pandas.read_csv('race_results.csv')
print(results.to_string(index=False))

year       race   Hugh  Jesse
2013  Shore Run  22.22  19.42
2015  Dawg Dash  20.17  19.27
2017  Shore Run  20.37  19.93


Plot race results over time¶

Next, we look at how the race results have changed over time for each runner, drawing a dashed line at the critical threshold of 20 minutes.

In [3]:
results_melt = pandas.melt(results, id_vars=['year'], value_vars=['Hugh', 'Jesse'],
var_name='runner', value_name='time')

seaborn.pointplot(x='year', y='time', data=results_melt, hue='runner', fit_reg=False)
plt.ylabel('time (minutes)')
plt.axhline(y=20, color='black', linestyle='--', linewidth=1)
plt.show()


Examine if there is a significant difference between runners¶

We perform statistical tests to determine if the runners have significantly different times.

First, we plot the distributions for the two runners.

In [4]:
seaborn.stripplot(data=results_melt, x='runner', y='time', s=10)
plt.ylabel('time (minutes)')
plt.show()


Then we test if they are significantly different using the non-parametric Mann-Whitney test.

In [5]:
scipy.stats.mannwhitneyu(results['Hugh'], results['Jesse'])

Out[5]:
MannwhitneyuResult(statistic=0.0, pvalue=0.04042779918502612)

Therefore, we can conclude with P < 0.05 that Jesse's times are significantly faster than Hugh's.