Quantified Self: The Value of Wearables

by Maarten Grootendorst

The Problem
In the last few years, wearables have taken the market by storm. They can be found in the form of watches, activity trackers, apps for mobile devices and many more. Fitbit, Apple, TomTom, Garmin, Samsung and many other companies have taken it onto themselves to release the most advanced wearable in the form of a watch that has the best user experience.

Similarly, machine learning (which consists of predicting events, data etc.) is growing in popularity and is being implemented more and more. For example, we can use machine learning to predict how well you're going to run based on what you've eaten in the past and how that affected your running.

I would like to propose something different and that concerns both user experience and hardware issues. A known problem of activity trackers that measure your heart rate is the inconsistencies (and often not measuring) in the measures. This can be due to hairy arms, sweating while working out or using it in the shower. The user may then see that some data is missing, which can be frustrating. Below you can see an example in my data. It shows that there is no heartrate for roughly an hour even though I was wearing the device! I was working out en sweating quite a lot, which may was what caused the difficulties.

Moreover, many people deal with a dangerously high heartrate due to certain patterns in their behaviour. Currently, fitbit doesn't tell you anything if your heartrate is too high. You have to figure that out yourself.

The Solution
Here, I'm not proposing a way to accurately measure the heartrate, but a model that can predict the missing values based on information of your heartrate, steps, distance etc. in the past. This will allow users to see a filled graph of the daily heartrate without any missing values.

Companies developing wearables can use the model to improve the user experience. Moreover, predicting heartrate will allow the device to recommendate certain actions to prevent a heartrate that is too high by simply sending an alert.

Note: If a wearable wath that measures heartrate incorrectly measures heartrate (which is difficult to establish) then the algorithm may be used to take the mean of the tracker and the algorithm to increase the measurement. However, this works best if the device can recognize that is has difficulties measuring heartrate.

The Code
In order to show you how the algorithm works and what it can do, I have to use some (depending on your background) difficult code. Don't worry, I will explain the general principles and focus mostly on the result.

The code used for cleaning the data can be found in "0. Cleaning Data" if you are interested. Below, I will simply load in the data after cleaning.

In [1]:

import seaborn as sns
import time
import pandas as pd
import fitbit_helper as fh
import numpy as np

%matplotlib inline
%load_ext rpy2.ipython

intraday = pd.read_csv('../Files/intraday.csv')
summaries = pd.read_csv('../Files/summaries.csv')
sleep = pd.read_csv('../Files/sleep.csv')

The Data - What can be Measured?¶

So now I've imported the data, but what is the data? For the last month I have worn a fitbit charge 2 that measures heartrate, steps, distance, sleep and many more things. I've tried to wear it 24/7, which I succeeded for most of the time. My goal was to get more insight in my behavior, but mostly because I like analyzing data and especially when it's my own. Moreover, I used an app called Moves that allows me to track my locations and save the data. Below you can see an example of the data that is tracked by moves:

My measured heart rate is still the main focus of this notebook, but the map is merely showing what you can do using a simple app and some ggplot2 code. Let's dive into the data and see what data

Now, let's see what data I have after importing and cleaning.

In [3]:

fh.print_columns(intraday, 'intraday')
fh.print_columns(sleep, 'sleep')
fh.print_columns(summaries, 'summaries')

For the data "intraday", we have 47520 datapoints and the following features:
- time
- floors
- distance
- calories
- steps
- heartrate

For the data "sleep", we have 33 datapoints and the following features:
- efficiency
- endTime
- minutesAsleep
- minutesAwake
- startTime
- time
- timeInBed

For the data "summaries", we have 31 datapoints and the following features:
- activityCalories
- calories
- caloriesBMR
- distance
- elevation
- floors
- minutesFairlyActive
- minutesLightlyActive
- minutesSedentary
- minutesVeryActive
- steps

There are three categories for me to work on:

Intraday: Describes features that were measured in intervals of 1 minute like heartrate and distance (movement of the fitbit)
Sleep: Describes my sleep of each night
Summaries: Gives a summary of daily activities

In this notebook, I will be focusing mostly on intraday features as the main problem concerns the missing heartrates and looking at the trend of heart rate.

The Visuals¶

It might be difficult to visualize what data I have, so I will do it for you! Let's start by plotting all intraday data I have.
Intraday is the data that is measured in seconds, like the steps, floors, heartrate, distance and calories burned.

As you can see all intraday features follow a certain pattern. If my distance is higher, then it is likely I have taken some steps or I'm doing some active task which tends to increase my heartrate. This plot made me think that it should be possible to accurately predict your heartrate. But before that, let's take a look at the data in more detail.

My heart rate is mostly distributed around 60 beats per minute, which is a healthy resting heart rate for people my age. It is interesting to see that it goes up to 140, but never over that. Moreover, my heart rate peaks during the evening and around 11 am. During the evenings I'm often working out, which might explain the height. During the day I work a lot and have more stress, which might explain the increasing heart rates.

So far, we've seen four plots that all give a lot of information that fitbit actually doesn't give you. The app has no distribution of heartrate, no hourly heartrate, no visual representation of activity and no overview of all data. It does a good job of showing some basics, but there are limitations to customization. Next will the main issue missing data.

The Solution¶

To reiterate the problem, in some cases fitbit has trouble measuring the heartrate which may lead to less accurate measurements, but most importantly, missing data.

In [2]:

fh.count_missing_data(intraday)

Out[2]:

	Total	Percent
heartrate	281	0.005913

Above you can see there's a small percentage that is missing, but it is enough that you can see it when plotting the data and visualizing it on the app. So how would I solve this problem? There are many solution for working with missing data:

Impute the missing values with the mode, median or mean
Removing the missing values
Using an algorithm to predict the missing values

I choose predicting the missing values as I believe that would lead to the most accurate values. After some initial cleaning of the data I created 4 basic models and stacked a few models to improve the score. Below you can see the scores of the models after tuning the parameters:

The scores were received by doing 10-fold cross-validation to make sure they were accurate. I didn't choose the stacked model due to the complexity of the model and the longer running time.

Next, I selected the date and tried to predict the missing values. Below you can see the values of that day where the green line is the actual data and the green line is predicted.

The Business Potential¶

There are two audiences to which this may appeal:

Fitbit
- Fitbit can use this to improve the accuracy of their heartrate device.
- It can decrease noise in the data by averaging badly measured heartrates with the prediction.
- It can be used to create warnings if a persons heartrate is predicted to increase or if it's increasing rapidly.
Users
- Gaps in the data may lead to customers thinking the device isn't accurate.
- Users want more features than is currently available (according the forums)

The result would be as follows:

Notes
The following notebooks were used in the process:

"0. Cleaning Data.ipynb"
- This notebook is used to clean the data with the help of fitbit_helper.py
"1. Visualizations.ipynb"
- This notebook is used to visualize the data and contains most visuals used in the notebook
"2. Modeling.ipynb"
- This notebook is used to create and select the appropriate model