So now I've imported the data, but what is the data? For the last month I have worn a fitbit charge 2 that measures heartrate, steps, distance, sleep and many more things. I've tried to wear it 24/7, which I succeeded for most of the time. My goal was to get more insight in my behavior, but mostly because I like analyzing data and especially when it's my own. Moreover, I used an app called Moves that allows me to track my locations and save the data. Below you can see an example of the data that is tracked by moves:
fh.print_columns(intraday, 'intraday')
fh.print_columns(sleep, 'sleep')
fh.print_columns(summaries, 'summaries')
For the data "intraday", we have 47520 datapoints and the following features: - time - floors - distance - calories - steps - heartrate For the data "sleep", we have 33 datapoints and the following features: - efficiency - endTime - minutesAsleep - minutesAwake - startTime - time - timeInBed For the data "summaries", we have 31 datapoints and the following features: - activityCalories - calories - caloriesBMR - distance - elevation - floors - minutesFairlyActive - minutesLightlyActive - minutesSedentary - minutesVeryActive - steps
There are three categories for me to work on:
In this notebook, I will be focusing mostly on intraday features as the main problem concerns the missing heartrates and looking at the trend of heart rate.
It might be difficult to visualize what data I have, so I will do it for you!
Let's start by plotting all intraday data I have.
Intraday is the data that is measured in seconds, like the steps, floors, heartrate, distance and calories burned.
As you can see all intraday features follow a certain pattern. If my distance is higher, then it is likely I have taken some steps or I'm doing some active task which tends to increase my heartrate. This plot made me think that it should be possible to accurately predict your heartrate. But before that, let's take a look at the data in more detail.
My heart rate is mostly distributed around 60 beats per minute, which is a healthy resting heart rate for people my age. It is interesting to see that it goes up to 140, but never over that. Moreover, my heart rate peaks during the evening and around 11 am. During the evenings I'm often working out, which might explain the height. During the day I work a lot and have more stress, which might explain the increasing heart rates.
So far, we've seen four plots that all give a lot of information that fitbit actually doesn't give you. The app has no distribution of heartrate, no hourly heartrate, no visual representation of activity and no overview of all data. It does a good job of showing some basics, but there are limitations to customization. Next will the main issue missing data.
To reiterate the problem, in some cases fitbit has trouble measuring the heartrate which may lead to less accurate measurements, but most importantly, missing data.
fh.count_missing_data(intraday)
Total | Percent | |
---|---|---|
heartrate | 281 | 0.005913 |
Above you can see there's a small percentage that is missing, but it is enough that you can see it when plotting the data and visualizing it on the app. So how would I solve this problem? There are many solution for working with missing data:
I choose predicting the missing values as I believe that would lead to the most accurate values. After some initial cleaning of the data I created 4 basic models and stacked a few models to improve the score. Below you can see the scores of the models after tuning the parameters:
The scores were received by doing 10-fold cross-validation to make sure they were accurate. I didn't choose the stacked model due to the complexity of the model and the longer running time.
Next, I selected the date and tried to predict the missing values. Below you can see the values of that day where the green line is the actual data and the green line is predicted.
There are two audiences to which this may appeal:
The result would be as follows:
Notes
The following notebooks were used in the process: