I Just Felt Like Running

Running on 365 consecutive days for no other reason

March 2017 was a bad month for my running -- I only ran four times. By early April, I decided it would be cool to try to go on a little run streak to get things going again. At first, I think the goal was 10 days straight. And after that, 20 days. And then, I decided to make my longest streak longer than my longest break (34 days). You get the point. Now here we are, 365 days later.

I just felt like running

I use Runkeeper on my phone to track GPS coordinates of every run I go on, so in commemoration of an aribtrary set of 365 consecutive days, I thought it'd be cool to take a look at the data.

On Runkeeper, you can go to Settings, then Export Data, and select the appropriate date range. This gives you a zip file containing a CSV file (cardioActivities.csv) summarizing every tracked activity, and a GPX file for each activity containing the GPS coordinates sampled every few seconds.

Summary

I'll start with cardioActivities.csv to just get a summary of how the year went.

In [1]:
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

datadir = 'activities'
summary_csv = os.path.join(datadir, 'cardioActivities.csv')

activities = pd.read_csv(summary_csv)
activities.head()
Out[1]:
Date Type Route Name Distance (km) Duration Average Pace Average Speed (km/h) Calories Burned Climb (m) Average Heart Rate (bpm) Notes GPX File
0 2018-04-07 09:21:31 Running NaN 3.65 18:14 5:00 12.00 315.0 2 NaN NaN 2018-04-07-0921.gpx
1 2018-04-06 07:30:17 Running NaN 2.66 13:57 5:15 11.43 229.0 2 NaN NaN 2018-04-06-0730.gpx
2 2018-04-05 18:12:03 Running NaN 5.68 30:00 5:17 11.35 491.0 5 NaN NaN 2018-04-05-1812.gpx
3 2018-04-04 07:53:16 Running NaN 5.37 27:36 5:09 11.66 464.0 4 NaN NaN 2018-04-04-0753.gpx
4 2018-04-03 09:02:23 Running NaN 5.61 28:27 5:04 11.83 483.0 1 NaN NaN 2018-04-03-0902.gpx

The data needs a little cleaning up by removing cycling activities (tracked during May is Bike Month) and converting the Date column to a proper datetime type. I also want to aggregate data for days on which I ran multiple times (usually to and from a location), so I can add a separate column for the full date+time and remove the time from the date column.

In [2]:
activities = activities[activities.Type == 'Running']
activities['Datetime'] = pd.to_datetime(activities['Date'])
activities['Date'] = activities['Datetime'].dt.normalize()
activities.head()
Out[2]:
Date Type Route Name Distance (km) Duration Average Pace Average Speed (km/h) Calories Burned Climb (m) Average Heart Rate (bpm) Notes GPX File Datetime
0 2018-04-07 Running NaN 3.65 18:14 5:00 12.00 315.0 2 NaN NaN 2018-04-07-0921.gpx 2018-04-07 09:21:31
1 2018-04-06 Running NaN 2.66 13:57 5:15 11.43 229.0 2 NaN NaN 2018-04-06-0730.gpx 2018-04-06 07:30:17
2 2018-04-05 Running NaN 5.68 30:00 5:17 11.35 491.0 5 NaN NaN 2018-04-05-1812.gpx 2018-04-05 18:12:03
3 2018-04-04 Running NaN 5.37 27:36 5:09 11.66 464.0 4 NaN NaN 2018-04-04-0753.gpx 2018-04-04 07:53:16
4 2018-04-03 Running NaN 5.61 28:27 5:04 11.83 483.0 1 NaN NaN 2018-04-03-0902.gpx 2018-04-03 09:02:23

Now for the aggregation. Distance should be accumulated by adding up all runs for a day, whereas it makes a bit more sense to just average the speed for different runs on a day.

In [3]:
activities = activities.groupby('Date').agg({
    'Distance (km)': np.sum,
    'Average Speed (km/h)': np.mean,
    'Calories Burned': np.sum,
    'Climb (m)': np.sum,
}).reset_index()

activities.describe()
Out[3]:
Distance (km) Average Speed (km/h) Calories Burned Climb (m)
count 365.000000 365.000000 365.000000 365.000000
mean 7.269151 11.612251 628.715068 11.394521
std 4.485533 0.927397 390.872166 33.269181
min 1.870000 4.520000 64.000000 0.000000
25% 4.540000 11.230000 393.000000 2.000000
50% 6.060000 11.650000 523.000000 4.000000
75% 9.290000 12.100000 816.000000 9.000000
max 42.220000 14.950000 3654.000000 383.000000

There, now we have exactly 365 rows. One measurement of interest is the total distance run for the year.

In [4]:
activities['Distance (km)'].sum()
Out[4]:
2653.24

In terms of distance between state capitols, that's pretty close to Google's estimate of the walking distance from Sacramento, CA to Topeka, KS.

Aside from total distance, I've been interested in seeing my run distance per day for the year.

In [5]:
import matplotlib.dates as mdates
import datetime

ax = plt.figure(figsize=(10, 5)).gca()

ax.plot(activities['Date'], activities['Distance (km)'])

def annotate_date(label, date, y, ax, **kwargs):
    def datestr2num(s):
        return mdates.date2num(datetime.datetime(*(int(i) for i in s.split('-'))))
    ax.annotate(label, xy=(datestr2num(date), y), **kwargs)

ax.axvspan('2017-07-10', '2017-07-16', color='k', alpha=0.2)
annotate_date('Scipy 2017', '2017-07-16', 30, ax)

annotate_date('Marathon 1', '2017-11-11', 42.2, ax)

ax.axvspan('2018-01-07', '2018-01-10', color='r', alpha=0.2)
annotate_date('flu :(', '2018-01-10', 15, ax)

ax.set_ylabel('date')
ax.set_ylabel('distance per day (km)')
Out[5]:
Text(0,0.5,'distance per day (km)')

You can see that I started out without dipping below 5km/day very much, had a couple rough days (SciPy 2017), then started training for my first marathon. After the marathon, things started falling apart and I started taking more 2-3km days. I also started lifting weights pretty seriously in December, which really saps day-to-day energy levels. 2018 also started off great with a pretty gnarly flu of some kind. I'm actually not sure if running with the flu for a few days was easier or harder than running the day after the marathon.

Looking at the GPS Data

I had been planning to do some neat analyses, but I'm going to leave it alone for now with a pretty picture that I can use for my Twitter banner or something.

I've run in a few places other than my home town, so I'll narrow things down to a bounding box around Davis, CA.

In [6]:
xlim = -121.8045788, -121.674863,
ylim = 38.512625, 38.5765569

I need to re-read the summary CSV file because I got rid of the "GPX File" column earlier.

In [7]:
activities = pd.read_csv(summary_csv)
activities = activities[activities.Type == 'Running']

Now I can make use of a nice library called gpxpy to parse the GPX files and extract the GPS coordinates from them.

In [8]:
import gpxpy

def loadgpx(fpath):
    with open(fpath, 'r') as f:
        return gpxpy.parse(f)

def points(gpx):
    xy = np.zeros((gpx.get_points_no(), 2))
    for i, p in enumerate(gpx.walk(only_points=True)):
        xy[i] = p.longitude, p.latitude
    return xy

Now generate the plot by iterating over GPX files and plotting the coordinates along the way. This takes a little while to run, so I downsample the coordinates to about 1 sample every 15 seconds or so.

In [9]:
width = xlim[1] - xlim[0]
height = ylim[1] - ylim[0]
ratio = abs(width / height)

fig = plt.figure(figsize=(10, 10/ratio))
ax = fig.gca()

for fname in activities['GPX File']:
    fp = os.path.join(datadir, fname)
    xy = points(loadgpx(fp))[::5, :]
    ax.plot(xy[:, 0], xy[:, 1], lw=0.2, alpha=0.5, color='k')
    
ax.set_xlim(*xlim)
ax.set_ylim(*ylim)
Out[9]:
(38.512625, 38.5765569)
In [10]:
fig.savefig('map.png', dpi=600)

Conclusions

The question now is, do I run tomorrow?