The Problem
In the last few years, wearables have taken the market by storm. They can be found in the form of watches, activity trackers, apps for mobile devices and many more. Fitbit, Apple, TomTom, Garmin, Samsung and many other companies have taken it onto themselves to release the most advanced wearable in the form of a watch that has the best user experience.
Similarly, machine learning (which consists of predicting events, data etc.) is growing in popularity and is being implemented more and more. For example, we can use machine learning to predict how well you're going to run based on what you've eaten in the past and how that affected your running.
I would like to propose something different and that concerns both user experience and hardware issues. A known problem of activity trackers that measure your heart rate is the inconsistencies (and often not measuring) in the measures. This can be due to hairy arms, sweating while working out or using it in the shower. The user may then see that some data is missing, which can be frustrating. Below you can see an example in my data. It shows that there is no heartrate for roughly an hour even though I was wearing the device! I was working out en sweating quite a lot, which may was what caused the difficulties.
Moreover, many people deal with a dangerously high heartrate due to certain patterns in their behaviour. Currently, fitbit doesn't tell you anything if your heartrate is too high. You have to figure that out yourself.
The Solution
Here, I'm not proposing a way to accurately measure the heartrate, but a model that can predict the missing values based on information of your heartrate, steps, distance etc. in the past. This will allow users to see a filled graph of the daily heartrate without any missing values.
Companies developing wearables can use the model to improve the user experience. Moreover, predicting heartrate will allow the device to recommendate certain actions to prevent a heartrate that is too high by simply sending an alert.
Note: If a wearable wath that measures heartrate incorrectly measures heartrate (which is difficult to establish) then the algorithm may be used to take the mean of the tracker and the algorithm to increase the measurement. However, this works best if the device can recognize that is has difficulties measuring heartrate.
The Code
In order to show you how the algorithm works and what it can do, I have to use some (depending on your background) difficult code. Don't worry, I will explain the general principles and focus mostly on the result.
The code used for cleaning the data can be found in "0. Cleaning Data" if you are interested. Below, I will simply load in the data after cleaning.
import seaborn as sns
import time
import pandas as pd
import fitbit_helper as fh
import numpy as np
%matplotlib inline
%load_ext rpy2.ipython
intraday = pd.read_csv('../Files/intraday.csv')
summaries = pd.read_csv('../Files/summaries.csv')
sleep = pd.read_csv('../Files/sleep.csv')
So now I've imported the data, but what is the data? For the last month I have worn a fitbit charge 2 that measures heartrate, steps, distance, sleep and many more things. I've tried to wear it 24/7, which I succeeded for most of the time. My goal was to get more insight in my behavior, but mostly because I like analyzing data and especially when it's my own. Moreover, I used an app called Moves that allows me to track my locations and save the data. Below you can see an example of the data that is tracked by moves:
My measured heart rate is still the main focus of this notebook, but the map is merely showing what you can do using a simple app and some ggplot2 code. Let's dive into the data and see what data
Now, let's see what data I have after importing and cleaning.
fh.print_columns(intraday, 'intraday')
fh.print_columns(sleep, 'sleep')
fh.print_columns(summaries, 'summaries')
For the data "intraday", we have 47520 datapoints and the following features: - time - floors - distance - calories - steps - heartrate For the data "sleep", we have 33 datapoints and the following features: - efficiency - endTime - minutesAsleep - minutesAwake - startTime - time - timeInBed For the data "summaries", we have 31 datapoints and the following features: - activityCalories - calories - caloriesBMR - distance - elevation - floors - minutesFairlyActive - minutesLightlyActive - minutesSedentary - minutesVeryActive - steps
There are three categories for me to work on:
In this notebook, I will be focusing mostly on intraday features as the main problem concerns the missing heartrates and looking at the trend of heart rate.
It might be difficult to visualize what data I have, so I will do it for you!
Let's start by plotting all intraday data I have.
Intraday is the data that is measured in seconds, like the steps, floors, heartrate, distance and calories burned.
As you can see all intraday features follow a certain pattern. If my distance is higher, then it is likely I have taken some steps or I'm doing some active task which tends to increase my heartrate. This plot made me think that it should be possible to accurately predict your heartrate. But before that, let's take a look at the data in more detail.
My heart rate is mostly distributed around 60 beats per minute, which is a healthy resting heart rate for people my age. It is interesting to see that it goes up to 140, but never over that. Moreover, my heart rate peaks during the evening and around 11 am. During the evenings I'm often working out, which might explain the height. During the day I work a lot and have more stress, which might explain the increasing heart rates.
So far, we've seen four plots that all give a lot of information that fitbit actually doesn't give you. The app has no distribution of heartrate, no hourly heartrate, no visual representation of activity and no overview of all data. It does a good job of showing some basics, but there are limitations to customization. Next will the main issue missing data.
To reiterate the problem, in some cases fitbit has trouble measuring the heartrate which may lead to less accurate measurements, but most importantly, missing data.
fh.count_missing_data(intraday)
Total | Percent | |
---|---|---|
heartrate | 281 | 0.005913 |
Above you can see there's a small percentage that is missing, but it is enough that you can see it when plotting the data and visualizing it on the app. So how would I solve this problem? There are many solution for working with missing data:
I choose predicting the missing values as I believe that would lead to the most accurate values. After some initial cleaning of the data I created 4 basic models and stacked a few models to improve the score. Below you can see the scores of the models after tuning the parameters:
The scores were received by doing 10-fold cross-validation to make sure they were accurate. I didn't choose the stacked model due to the complexity of the model and the longer running time.
Next, I selected the date and tried to predict the missing values. Below you can see the values of that day where the green line is the actual data and the green line is predicted.
There are two audiences to which this may appeal:
The result would be as follows:
Notes
The following notebooks were used in the process: