Development and validation of a stacking ensemble model for death prediction in the Chinese Longitudinal Healthy Longevity Survey (CLHLS)
•A well-performing all-cause mortality prediction model was established using the stacking ensemble method.•SHapley Additive exPlanations (SHAP) value was introduced to assess the feature contributions.•Several variables that contributed most to higher risk of death in the elderly were revealed. Thi...
Gespeichert in:
Veröffentlicht in: | Maturitas 2024-04, Vol.182, p.107919-107919, Article 107919 |
---|---|
Hauptverfasser: | , , , , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | •A well-performing all-cause mortality prediction model was established using the stacking ensemble method.•SHapley Additive exPlanations (SHAP) value was introduced to assess the feature contributions.•Several variables that contributed most to higher risk of death in the elderly were revealed.
This study aimed to develop and validate a mortality risk prediction model for older people based on the Chinese Longitudinal Healthy Longevity Survey using the stacking ensemble strategy.
A total of 12,769 participants aged 65 or more at baseline were included. Ensemble machine learning models were applied to develop a mortality prediction model. We selected three base learners, including logistic regression, eXtreme Gradient Boosting, and Categorical + Boosting, and used logistic regression as the meta-learner. The primary outcome was five-year survival. Variable importance was evaluated by the SHapley Additive exPlanations method.
The mean age at baseline was 88, and 57.8 % of participants were women. The CatBoost model performed the best among the three base learners, the area under the receiver operating characteristics curve (AUC) reached 0.8469 (95%CI: 0.8345–0.8593), and the stacking ensemble model further improved the discrimination ability (AUC = 0.8486, 95%CI: 0.8367–0.8612, P = 0.046). Conventional logistic regression had comparable performance (AUC = 0.8470, 95 % CI: 0.8346–0.8595). Older age, higher scores for self-care activities of daily living, being male, higher objective physical performance capacity scores, not undertaking housework, and lower scores on the Mini-Mental State Examination contributed to higher risk.
We successfully constructed and validated a few death risk prediction models for a Chinese population of older adults. While the stacking ensemble approach had the best prediction performance, the improvement over conventional logistic regression was insubstantial. |
---|---|
ISSN: | 0378-5122 1873-4111 |
DOI: | 10.1016/j.maturitas.2024.107919 |