Predicting death by suicide using administrative health care system data: Can recurrent neural network, one-dimensional convolutional neural network, and gradient boosted trees models improve prediction performance?

•The optimal gradient boosted trees model configuration (AUC: 0.8493) outperformed the optimal recurrent neural network model configuration (AUC: 0.8407), one-dimensional convolutional neural network configuration (AUC: 0.8419), and logistic regression (AUC: 0.8179).•The optimal gradient boosted tre...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of affective disorders 2020-03, Vol.264, p.107-114
Hauptverfasser: Sanderson, Michael, Bulloch, Andrew GM, Wang, JianLi, Williamson, Tyler, Patten, Scott B
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:•The optimal gradient boosted trees model configuration (AUC: 0.8493) outperformed the optimal recurrent neural network model configuration (AUC: 0.8407), one-dimensional convolutional neural network configuration (AUC: 0.8419), and logistic regression (AUC: 0.8179).•The optimal gradient boosted trees model configuration was the least computationally expensive model class, and also achieved better calibration than logistic regression. The gradient boosted trees model class appeared to be the most promising model class for future research.•Analytic datasets with increasing quarters eventually led to slowly decreasing performance. Performance increased with more quarters until a maximum was reached, after which additional quarters resulted in decreasing performance. The risk state over the past year appears to be the most important for quantifying suicide risk and considering risk states over longer time periods will not result in improvements in quantifying suicide risk. Suicide is a leading cause of death, particularly in younger persons, and this results in tremendous years of life lost. To compare the performance of recurrent neural networks, one-dimensional convolutional neural networks, and gradient boosted trees, with logistic regression and feedforward neural networks. The modeling dataset contained 3548 persons that died by suicide and 35,480 persons that did not die by suicide between 2000 and 2016. 101 predictors were selected, and these were assembled for each of the 40 quarters (10 years) prior to the quarter of death, resulting in 4040 predictors in total for each person. Model configurations were evaluated using 10-fold cross-validation. The optimal recurrent neural network model configuration (AUC: 0.8407), one-dimensional convolutional neural network configuration (AUC: 0.8419), and XGB model configuration (AUC: 0.8493) all outperformed logistic regression (AUC: 0.8179). In addition to superior discrimination, the optimal XGB model configuration also achieved superior calibration. Although the models developed in this study showed promise, further research is needed to determine the performance limits of statistical and machine learning models that quantify suicide risk, and to develop prediction models optimized for implementation in clinical settings. It appears that the XGB model class is the most promising in terms of discrimination, calibration, and computational expense. Many important predictors are not available in administrative dat
ISSN:0165-0327
1573-2517
DOI:10.1016/j.jad.2019.12.024