Phenotyping people with a history of injecting drug use within electronic medical records using an interactive machine learning approach

People with a history of injecting drug use are a priority for eliminating blood-borne viruses and sexually transmissible infections. Identifying them for disease surveillance in electronic medical records (EMRs) is challenged by sparsity of predictors. This study introduced a novel approach to phen...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:NPJ digital medicine 2024-11, Vol.7 (1), p.346-10
Hauptverfasser: El-Hayek, Carol, Nguyen, Thi, Hellard, Margaret E., Curtis, Michael, Sacks-Davis, Rachel, Aung, Htein Linn, Asselin, Jason, Boyle, Douglas I. R., Wilkinson, Anna, Polkinghorne, Victoria, Hocking, Jane S., Dunn, Adam G.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:People with a history of injecting drug use are a priority for eliminating blood-borne viruses and sexually transmissible infections. Identifying them for disease surveillance in electronic medical records (EMRs) is challenged by sparsity of predictors. This study introduced a novel approach to phenotype people who have injected drugs using structured EMR data and interactive human-in-the-loop methods. We iteratively trained random forest classifiers removing important features and adding new positive labels each time. The initial model achieved 92.7% precision and 93.5% recall. Models maintained >90% precision and recall after nine iterations, revealing combinations of less obvious features influencing predictions. Applied to approximately 1.7 million patients, the final model identified 128,704 (7.7%) patients as potentially having injected drugs, beyond the 50,510 (2.9%) with known indicators of injecting drug use. This process produced explainable models that revealed otherwise hidden combinations of predictors, offering an adaptive approach to addressing the inherent challenge of inconsistently missing data in EMRs.
ISSN:2398-6352
2398-6352
DOI:10.1038/s41746-024-01318-y