Identifying Injury Risk Factors for Elite Soccer Teams Using Survival Analysis

In this thesis, we experiment with survival analysis techniques to identify significant injury risk factors in subjective training load and wellness data from elite female soccer teams. We investigate the effect of dataset size and the number of covariates used, and how missing data should be handle...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
1. Verfasser:	Jarmann, Anna Linnea
Format:	Dissertation
Sprache:	nor
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In this thesis, we experiment with survival analysis techniques to identify significant injury risk factors in subjective training load and wellness data from elite female soccer teams. We investigate the effect of dataset size and the number of covariates used, and how missing data should be handled. The univariate survival analysis models Kaplan-Meier, Weibull and Piecewise Exponential are used to inspect if the dataset is suited for survival analysis. We use the Cox Proportional Hazards Model to estimate the significance of each covariate, and we apply regularisation with different penalty terms to extract the most essential factors. We also investigate the use of the Cox Time-Varying Model with time-dependent covariates. Additionally, we compare the use of first injuries versus recurrent injuries and the use of day-of-the-event values versus using averaged values from all days prior. Our results showed that using the Cox Proportional Hazards Model with regularisation using a low penalty term and a threshold of 10% resulted in identifying the most critical injury risk factors: prior injury, sleep quality, fatigue and ACWR. We found that using smaller datasets impacts the results as they reduce the possible covariates, and the estimates for the covariates are unreliable. More extensive datasets, however, result in more certain estimates and injury risk factors that make sense based on previous research while also allowing for more covariates. When using multivariate models, the number of covariates used does affect the outcome, as using too many can result in issues with multicollinearity. Missing data at the start and end of a player’s dataset is handled by cutting them off. Randomly scattered missing data points are replaced by zeros, except for the case of using averaged values from whole durations, where the missing data points are left as is. We found that using recurrent injuries reflects more of the real world, as it is common for soccer players to encounter multiple injuries. It also has the benefit of providing more data for analysis. Averaged covariate values from whole duration intervals are preferred as they can be combined with recurrent injuries and capture changes in the values over time. In terms of computer science, this thesis contributes to the concepts of data processing, handling missing data and small datasets, survival analysis, feature selection and regularisation. Our research also provides alternative methods to evaluate risk an