The Canadian National Women’s Rugby Team is interested in the relationship between training load, performance and wellness in Rugby 7s. Rugby 7s is a fast-paced, physically demanding sport that pushes the limits of athlete speed, endurance and toughness. Rugby 7s players may play in up to three games in a day, resulting in a tremendous amount of athletic exertion. Substantial exertion results in fatigue, which may lead to physiological deficits (e.g., dehydration), reduced athletic performance, and greater risk of injury.
Despite the importance of managing training load in professional athletics, very little is known about its effects, and many training decisions are based on “gut feel.” Currently, training load is measured through a combination of subjective measurements (asking players how hard they worked) and objective measurements from wearable technology. Wellness is typically estimated by asking players how they feel in wellness surveys. However, there is no agreed-upon standard of defining wellness so the relationship between training load, performance and wellness is unclear.
(Notebook: WellnessMeasure.ipynb, Data: Wellness)
The original wellness data of athletes consists of 19 columns, of which there were 11 quantitative measures and 6 qualitative measures with the other two columns being Date and PlayerID. Our aim for this dataset is to come up with a quantitative summary measure(s). For understanding what each variable means, please refer to the CodeBook.
Note: We decided to take two routes. One was that of factor analysis which can be found at the notebook over here: Wellness-FactorAnlaysis.ipynb and the other which we are going to describe in this report. The one we describe over here turns out to be better in terms of summary measures being less correlated. The approach followed was to try to come up with one quantitative value and keep the categorical variables intact.
For complete understanding as to how we cleaned the data, please refer to this notebook: WellnessMeasure.ipynb. In this notebook, we add all the quantitative values after scaling them from 1-7 and come up with a WellnessScore. But we soon realize that the data we had wasn't completely independent. Intuitively, doesn't our wellness today depend on how we felt yesterday or even a week before? So, we define a new score by applying exponential moving average to the WellnessScore. Now, each player's 'EWMScore' will depend on a week of WellnessScores with exponentially decreasing weights as me move further back in time. The dataset looks like this:
Let us visualize the scores so that we can see what this actually means. The comparison of WellnessScore and EWMScore for Player with ID 1 is as follows:
(Notebook: RPE_cleaning.ipynb, Data: RPE)
The RPE/training load data of athletes consists of 14 columns, of which there were 7 quantitative measures and 5 qualitative measures, excluding the PlayerID and Date. Our aim was again to come up with a quantitative summary measure(s) for the dataset. For in-depth understanding of each variable, please refer to the CodeBook.
Note: We again decided to take two routes. One was that of factor analysis (which can be found at the notebook over here: RPE_cleaning.ipynb) and the other which we are going to describe below. However, factor analysis does not work well. When we plot the new factors compared to AccelerationImpulse, we do not see any clear pattern. Hence, we decide to either come up with a new performance measure instead of AccelerationImpulse.
First, we captured all the missing values from the same date, and cleaned the data. We dropped the columns ObjectiveRating, BestoutofSelf, and FocusRating, since there are too many missing values here. Besides, we removed all the rows still with missing values. Up to here, dimension of our RPE data set is (7621, 11). Secondly, we have players who did same training session multiple times. We would like to average their performance for the same session types. Finally, the dimension was reduced to (7192, 11).
(Notebook: PerformanceMeasure.ipynb, Data: GPS,Games)
The aim for this dataset is to find a single performance measure per game for each player. We first thought of having a measure of a highest acceleration value per player, but it didn't show much variation and all the players performance seemed to be very similar.
We then came up with the idea of having a PerformanceScore, which was a score on the scale of 0-100 and would be calculated in the following manner:
We were also curious to see if the outcome of the game or the points difference had any impact on the players. We wanted to know if players perform better when their team is under pressure with a large points difference and if players performance better or worse when the team is winning or losing. So we added those measures into the dataset using the games data as well. Our final dataset looked like this:
The PerformanceScore we came up with looks decent with values ranging from 17 all the way up to 80. The distribution of PerformanceScore is as follows:
We can also see from the Player wise PerformanceScore below that there is far more variability in our performance measure now. Also, note that we have removed players 18-21 as they didn't play any game.
After we came up with new performance measure, we investigated the relationship between some variables and new performance measure. Here is an example: We would like to see whether load status affects player's performance. We conducted the Tukey test, which is a pairwise group mean comparison.
As we can see, the test for normal group and recovering group suggests there is no difference between group mean. However, we can see that high group has significant mean difference compared to normal group and recovering group. Also, the mean difference is negative, which means if a player has higher training load, it has higher performance score than other two groups.
(Notebook: Model_Selection, Data: merged_data)
The typical regression model assumes all rows should be independent. However, in our data set, we have multiple observations (rows) for each player. These rows are dependent. We should not use typical linear regression here.
Linear mixed models are an extension of simple linear models to allow both fixed and random effects, and are particularly used when there is non independence in the data, such as arises from a hierarchical structure.
When there are multiple levels, such as performances on different date from same player, the variability in the outcome can be thought of as being either within group or between group. Performance level observations are not independent, as within a given player performance are more similar.
The advantages of using mixed effects models are:
We can keep as many rows of data as possible, instead of aggregating data by player. Although aggregate data analysis yields consistent and effect estimates and standard errors, it does not really take advantage of all the data, because player data are simply averaged.
Even if we can run regression for each player, each model does not take advantage of the information in data from other players. This can cause some bias since we consider only for one specific player.
It keeps the flexibility in treating time as continuous.
A typical regression model:
Y = X$\beta$ + $\epsilon$, where X is fixed effects, $\epsilon$ is random noise
Linear Mixed Effects Model:
Y = X$\beta$ + Z$\mu$ + $\epsilon$
Where y is a Nx1 column vector of the outcome variables, X is a Nxp matrix of the p predictor variables, $\beta$ is a px1 column vector of the fixed-effects regression coefficients, Z is a Nxq design matrix for random effects, u is a qx1 random complement to fixed $\beta$, $\epsilon$ is a Nx1 column vector of the residuals
Our goal is to predict a player's overall performance based on variables from wellness, rpe, and gps data as described above. We would like to build a multiple linear mixed effects model on variables that are significantly impacting player's performance. In order to achieve this, we applied Backward Elimination. We will firstly build a full model based on all variables we have. Then, consider the feature with the highest P-Value. If its P-value is greater than significance level (We fix 0.1 as significance level), we will eliminate this variable and build a new model. Until every variable has p-value < 0.1, our model is ready.
The reason for choosing Backward Elimination:
In linear mixed effects model, in order to measure how good the model is, a useful technique is called REML – residual maximum likelihood. We will use this criteria to determine whether the reduced model is as good as full model.
Let's take a look at our full model first:
As we can see, ChronicLoad has p-value 0.992, which is significantly larger than 0.1, we will drop it for next model. Keep in mind, the REML is about 58.6.
Since the process is repetitive, please see our notebook for more details. We will drop SessionType, Menstruation, RPE, SessionLoad, and Duration step by step. Again, even if we have insigificant p-value for some levels in a categrical variable, we still need to keep it if there exist at least some significant levels.
Up until this model, our REML climbs up to 58.9, and all variables or at least one level of categorical variable have p-value <0.1. We confirm this is our final model.
Some Interpretations:
The coefficients for two levels of Load_Status are about -8. The baseline level is high, so we can conclude if a player did a high load training before the game, their performance score is about 8 higher than normal training or recovering.
The coefficients for two levels of Illness are about 5.6 and -9. The baseline level is No, so we can conclude if a player is ill, her performance score is lower by 9. It is noticeable that if the player feels sightly off, her performance score is actually 5.67 higher.
EWMScore has coefficient -0.412, which indicates a negative relationship between performance and EWM score. This makes sense, since a high EWMScore means less soreness, fatigue. This indicates their training load is not high. Thus, it is consistent with first interpretation.
We will plot the residuals versus fitted value to verify if residuals are random. Also, qq-plot will check the normality assumptions.
As we can see, the normality is good, as all data points are roughly on or close to line. All residuals are pretty random and there is no apparent pattern or polynomial that can describe the residuals.
After we conclude some associations between performance and other variables in our model, we would like to include some visualizations to verify our results. Here, we will plot Load Status, Illness, Pain, and NutritionAdjustment as x-axis, and performance score as response variable. As shown in the box plots, the player with high training load has significant better performance than other two status. The players who feel sightly off in fact perform slightly better than other two groups! Also, we only have 1 observation for a ill player, thereby indicating that if a player feels ill, she will probably be out of the next day's games. If a player feels painful, she usually has better performance. This is reasonable, since pain is highly correlated with load status, if the status is high, most of player feels painful. Lastly, players who have made the nutrition adjustment perform better than those who haven't.
We would suggest players to have more acute exercise, and less chronic exercise, so that the AcuteChronic Ratio is higher. In this way, they would have high training load status and in turn, better performance.
NutritionAdjustment is important for player's performance. A player being self-aware about Nutrition Adjustment seems to performance better than those who don't know about their nutrition levels. In our model, if a player has made nutrition adjustment for the training day, her performance is about 3 points higher on our performance scale.
We should avoid including game result and points difference to predict future performance.
A player may have multiple types of training sessions on the day before game. We can build regression models grouped by session types.
Better/Complete data like heart rate would have allowed us to use models like the Bannister model and we could have come up with better summary measures.
Avinash Navlani (2019), https://www.datacamp.com/community/tutorials/introduction-factor-analysis
Gabriela K Hajduk(2017) Introduction to linear mixed models https://ourcodingclub.github.io/2017/03/15/mixed-models.html
Gelman, A. & Hill, J. (2006). Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press
Singer, J. D. & Willett, J. B. (2003). Applied Longitudinal Data Analysis: Modeling Change and Event Occurrence. Oxford University Press