Machine learning–based outcome prediction and novel hypotheses generation for substance use disorder treatment

Abstract Objective Substance use disorder is a critical public health issue. Discovering the synergies among factors impacting treatment program success can help governments and treatment facilities develop effective policies. In this work, we propose a novel data analytics approach using machine le...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of the American Medical Informatics Association : JAMIA 2021-06, Vol.28 (6), p.1216-1224
Hauptverfasser: Nasir, Murtaza, Summerfield, Nichalin S, Oztekin, Asil, Knight, Margaret, Ackerson, Leland K, Carreiro, Stephanie
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Abstract Objective Substance use disorder is a critical public health issue. Discovering the synergies among factors impacting treatment program success can help governments and treatment facilities develop effective policies. In this work, we propose a novel data analytics approach using machine learning models to discover interaction effects that might be neglected by traditional hypothesis-generating approaches. Materials and Methods A patient-episode-level substance use treatment discharge dataset and a Federal Bureau of Investigation crime dataset were joined using core-based statistical area codes. Random forests, artificial neural networks, and extreme gradient boosting were applied with a nested cross-validation methodology. Interaction effects were identified based on the machine learning model with the best performance. These interaction effects were analyzed and tested using traditional logistic regression models on unseen data. Results In predicting patient completion of a treatment program, extreme gradient boosting performed the best with an area under the curve of 89.31%. Based on our procedure, 73 interaction effects were identified. Among these, 14 were tested using traditional logistic regression models where 12 were statistically significant (P
ISSN:1527-974X
1067-5027
1527-974X
DOI:10.1093/jamia/ocaa350