Prediction of early childhood obesity with machine learning and electronic health record data

•Machine learning models provide good prediction of future childhood obesity at age 2 years.•A comprehensive EHR based machine learning workflow is presented.•Built-in data quality control, data transformation, missing data imputation, obesity prediction, and model interpretation.•Head circumference...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:International journal of medical informatics (Shannon, Ireland) Ireland), 2021-06, Vol.150, p.104454-104454, Article 104454
Hauptverfasser: Pang, Xueqin, Forrest, Christopher B., Lê-Scherban, Félice, Masino, Aaron J.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:•Machine learning models provide good prediction of future childhood obesity at age 2 years.•A comprehensive EHR based machine learning workflow is presented.•Built-in data quality control, data transformation, missing data imputation, obesity prediction, and model interpretation.•Head circumference, body temperature and respiratory rate were identified as important model features.•Prediction accuracy varies slightly among different race/ethnicity groups. This study compares seven machine learning models developed to predict childhood obesity from age > 2 to ≤ 7 years using Electronic Healthcare Record (EHR) data up to age 2 years. EHR data from of 860,510 patients with 11,194,579 healthcare encounters were obtained from the Children’s Hospital of Philadelphia. After applying stringent quality control to remove implausible growth values and including only individuals with all recommended wellness visits by age 7 years, 27,203 (50.78 % male) patients remained for model development. Seven machine learning models were developed to predict obesity incidence as defined by the Centers for Disease Control and Prevention (age/sex adjusted BMI>95th percentile). Model performance was evaluated by multiple standard classifier metrics and the differences among seven models were compared using the Cochran's Q test and post-hoc pairwise testing. XGBoost yielded 0.81 (0.001) AUC, which outperformed all other models. It also achieved statistically significant better performance than all other models on standard classifier metrics (sensitivity fixed at 80 %): precision 30.90 % (0.22 %), F1-socre 44.60 % (0.26 %), accuracy 66.14 % (0.41 %), and specificity 63.27 % (0.41 %). Early childhood obesity prediction models were developed from the largest cohort reported to date. Relative to prior research, our models generalize to include males and females in a single model and extend the time frame for obesity incidence prediction to 7 years of age. The presented machine learning model development workflow can be adapted to various EHR-based studies and may be valuable for developing other clinical prediction models.
ISSN:1386-5056
1872-8243
DOI:10.1016/j.ijmedinf.2021.104454