Machine Learning Informed Diagnosis for Congenital Heart Disease in Large Claims Data Source

With an increasing interest in using large claims databases in medical practice and research, it is a meaningful and essential step to efficiently identify patients with the disease of interest. This study aims to establish a machine learning (ML) approach to identify patients with congenital heart...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:JACC. Advances (Online) 2024-02, Vol.3 (2), p.100801, Article 100801
Hauptverfasser: Marelli, Ariane J., Li, Chao, Liu, Aihua, Nguyen, Hanh, Moroz, Harry, Brophy, James M., Guo, Liming, Buckeridge, David L., Tang, Jian, Yang, Archer Y., Li, Yue
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:With an increasing interest in using large claims databases in medical practice and research, it is a meaningful and essential step to efficiently identify patients with the disease of interest. This study aims to establish a machine learning (ML) approach to identify patients with congenital heart disease (CHD) in large claims databases. We harnessed data from the Quebec claims and hospitalization databases from 1983 to 2000. The study included 19,187 patients. Of them, 3,784 were labeled as true CHD patients using a clinician developed algorithm with manual audits considered as the gold standards. To establish an accurate ML-empowered automated CHD classification system, we evaluated ML methods including Gradient Boosting Decision Tree, Support Vector Machine, Decision tree, and compared them to regularized logistic regression. The Area Under the Precision Recall Curve was used as the evaluation metric. External validation was conducted with an updated data set to 2010 with different subjects. Among the ML methods we evaluated, Gradient Boosting Decision Tree led the performance in identifying true CHD patients with 99.3% Area Under the Precision Recall Curve, 98.0% for sensitivity, and 99.7% for specificity. External validation returned similar statistics on model performance. This study shows that a tedious and time-consuming clinical inspection for CHD patient identification can be replaced by an extremely efficient ML algorithm in large claims database. Our findings demonstrate that ML methods can be used to automate complicated algorithms to identify patients with complex diseases. [Display omitted]
ISSN:2772-963X
2772-963X
DOI:10.1016/j.jacadv.2023.100801