Reviewing my marathon training using MapMyFitness and Pandas¶

I'm training for a marathon and I use MapMyFitness (MMF) on my iPhone to track my mileage and pace for each workout. MMF has a public API and Jason Sanford has written a Python front end for it. Which means that I can easily get hold of all my data in Python and explore it with Pandas! To run this notebook, you'll need:

A scientific Python stack (I recommend Anaconda)
A MapMyFitness account and some recorded workout data
A MapMyFitness API key and access token (go to https://www.mapmyapi.com/io-docs)

You should also

pip install mapmyfitness

Here we go...

In [1]:

%matplotlib inline
import pandas as pd
from mapmyfitness import MapMyFitness
import matplotlib.pyplot as plt
import matplotlib
import numpy as np
pd.options.display.mpl_style = 'default'

The next cell logs into MMF, grabs all my workout data, filters for a specified activity type (running, in my case), and extracts the date, distance and pace for each workout. You'll need to enter your API key and access token to use it.

In [2]:

def get_workouts(verbose=True, workout_type='run'):
    # Log in
    mmf = MapMyFitness(api_key='your-key', \
            access_token='your-token')

    # get all workouts
    workout_pages = mmf.workout.search(user=48155002,per_page=40,cache_finds=True)  # doesn't work if per_page>40

    paces     = []
    distances = []
    dates     = []
    workouts  = []

    for pagenum in workout_pages.page_range:
        workout_list = workout_pages.page(pagenum)

        for i,workout in enumerate(workout_list):
            if verbose:
                print "processing workout " + str(i+1) + " of " + str(len(workout_list))
            if workout_type in workout.activity_type.root_activity_type.name.lower():
                workouts.append(workout)
                distances.append(workout.distance_total/1609.344) # convert meters to miles
                paces.append(26.8224/workout.speed_avg) # convert m/s to minutes per mile
                dates.append(workout.start_datetime)
    return distances, paces, dates, workouts

Be warned that this function takes a while. You can set verbose=1 to have it update you regularly.

In [3]:

distances, paces, dates, workout_list = get_workouts(verbose=0)

/Users/ketch/anaconda/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py:730: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.org/en/latest/security.html (This warning will only appear once by default.)
  InsecureRequestWarning)

Creating a Pandas Dataframe¶

The basic Pandas object we'll use is a dataframe. We can create it as follows:

In [4]:

dist_ts = pd.Series(distances,index=dates)
pace_ts = pd.Series(paces,index=dates)
df = pd.DataFrame({'Distance': dist_ts, 'Pace': pace_ts})

Let's see what's in it. Rather than printing the whole (long) table, I'll use head to print the first few rows.

In [5]:

df.head()

Out[5]:

	Distance	Pace
2014-09-02 03:24:23+00:00	1.69787	9.307203
2014-09-04 16:49:09+00:00	2.05400	9.356798
2014-09-06 18:10:46+00:00	2.90366	9.370637
2014-09-08 03:00:49+00:00	3.36122	9.943686
2014-09-10 03:02:17+00:00	3.09783	9.788455

As you can see, the workouts are sorted chronologically. I started training (and using MapMyFitness) at the beginning of September, about 4 months ago. At the time I could only comfortably run 2-3 miles, and my pace was slower than 9 minutes per mile.

Basic plotting¶

Now we can easily plot workout distance and pace versus date:

In [6]:

fs = 15
plotargs = {'figsize' : (12,4), 'fontsize' : fs}
df.Distance.plot(title='Workout distance (miles)',lw=2,**plotargs);

In [7]:

df.Pace.plot(title='Average pace (minutes per mile)',lw=2,**plotargs);

Clearly, I'm running farther and faster as my training program progresses! My pace is down to around 8 minutes per mile, and my typical runs are 5 miles or more. Here's a histogram of the distances for all my workouts since I started:

In [8]:

df.Distance.plot(kind='hist',title='Workout distance (miles)');
plt.xlabel('Miles'); plt.ylabel('Number of runs');

How far have I run in total? Here's a cumulative distance plot, showing that I've run more than 200 miles.

In [9]:

df.Distance.cumsum().plot(title='Total workout distance (miles)',lw=2,**plotargs);

Stacked histogram separated by dates¶

Let's see how my paces in the last two months compare to those in the first two months. I suspect Pandas has a better way to do this than what I've implemented below, but this works...

In [10]:

pace1 = []
pace2 = []

for date,pace in zip(df.index,df['Pace']):
    if date.month<11:
        pace1.append(pace)
        pace2.append(np.nan)
    else:
        pace1.append(np.nan)
        pace2.append(pace)
        
df2 = pd.DataFrame({'First two months pace' : pd.Series(pace1,index=dates), 
                    'Last two months pace' : pd.Series(pace2,index=dates)})
df2.plot(kind='hist',stacked=True,fontsize=fs); plt.ylabel('# of runs');

Again, it's clear that my pace has improved.

Longest runs¶

Here are my five longest runs:

In [11]:

df.sort('Distance')[-5:][::-1]

Out[11]:

	Distance	Pace
2014-12-27 04:39:34+00:00	10.28480	8.698211
2014-12-06 04:36:18+00:00	9.00748	8.506801
2014-12-13 04:39:00+00:00	8.97491	8.635715
2014-11-22 03:09:55+00:00	8.47542	8.469062
2014-11-15 03:09:18+00:00	7.65717	8.535385

Weekly and monthly data¶

Next we aggregate data for each week and each month:

In [12]:

weekly  = df.resample('W',how=['mean','sum'],kind='period')
monthly = df.resample('M',how=['mean','sum'],kind='period')

Now we can plot the average pace for each week:

In [13]:

fig = weekly.Pace['mean'].plot(title='Average pace',**plotargs)
fig.set_ylabel('minutes per mile',fontsize=fs);

The total milage per week:

In [14]:

fig = weekly.Distance['sum'].plot(kind='bar',title='Total miles per week',**plotargs)
fig.set_ylabel('Miles',fontsize=fs);

The plot above shows that I have not been terribly consistent, and have missed a number of workouts due to travel or sickness. For instance, the second week of October I was on vacation in Jordan, and the last week of November I was in Mexico on business.

Or the average mileage per run, by month:

In [15]:

fig = monthly.Distance['mean'].plot(kind='bar',title='Average miles per run',**plotargs)
fig.set_ylabel('Miles',fontsize=fs);

Plotting the routes¶

MapMyFitness records the actual routes, so we can also plot them.

In [16]:

import mpld3
mpld3.enable_notebook()

plt.figure(figsize=(8,8))

for w in workout_list:
    if w.route is not None:
        points = [(p['lat'],p['lng']) for p in w.route.points()];
        lat, long = zip(*points);
        if min(long)>39:  # Omit workouts in other countries
            plt.plot(long,lat)

The map above is zoomable in the actual notebook (thanks to mpld3). You can see a lot of the roads at KAUST, as well as the running paths around the Gardens and through Thuwal park on the Island (see the squiggly line on the top left). The path to the beacon is also obvious. The line at the far right goes to the KAUST stadium, with a partial lap around the track. You can even see where the road to the south Beach has moved due to construction.

If you're not familiar with KAUST, just compare the plot above with a satellite map.

Using GeoPandas¶

I didn't get this to work yet...

In [17]:

from geopandas import GeoSeries, GeoDataFrame
from shapely.geometry import Polygon, Point
from descartes import PolygonPatch

In [18]:

s = GeoSeries([Point(x, y) for x, y in zip(lat,long)])

In [19]:

s.head()

Out[19]:

0    POINT (22.3110425939 39.0925496905)
1     POINT (22.3110589707 39.092546193)
2    POINT (22.3110709805 39.0925410949)
3    POINT (22.3110202223 39.0925507253)
4    POINT (22.3109895718 39.0925490764)
dtype: object

In [20]:

df = GeoDataFrame()
df['geometry'] = s
df.geometry.total_bounds

Out[20]:

(22.3029487047, 39.092172588499999, 22.324501295800001, 39.098822046999999)