Development of a novel machine learning model to predict presence of nonalcoholic steatohepatitis

Abstract Objective To develop a computer model to predict patients with nonalcoholic steatohepatitis (NASH) using machine learning (ML). Materials and Methods This retrospective study utilized two databases: a) the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) nonalcoholic...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Journal of the American Medical Informatics Association : JAMIA 2021-03, Vol.28 (6), p.1235-1241
Hauptverfasser:	Docherty, Matt, Regnier, Stephane A, Capkun, Gorana, Balp, Maria-Magdalena, Ye, Qin, Janssens, Nico, Tietz, Andreas, Löffler, Jürgen, Cai, Jennifer, Pedrosa, Marcos C, Schattenberg, Jörn M
Format:	Artikel
Sprache:	eng
Schlagworte:	Research and Applications
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Abstract Objective To develop a computer model to predict patients with nonalcoholic steatohepatitis (NASH) using machine learning (ML). Materials and Methods This retrospective study utilized two databases: a) the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) nonalcoholic fatty liver disease (NAFLD) adult database (2004-2009), and b) the Optum® de-identified Electronic Health Record dataset (2007-2018), a real-world dataset representative of common electronic health records in the United States. We developed an ML model to predict NASH, using confirmed NASH and non-NASH based on liver histology results in the NIDDK dataset to train the model. Results Models were trained and tested on NIDDK NAFLD data (704 patients) and the best-performing models evaluated on Optum data (~3,000,000 patients). An eXtreme Gradient Boosting model (XGBoost) consisting of 14 features exhibited high performance as measured by area under the curve (0.82), sensitivity (81%), and precision (81%) in predicting NASH. Slightly reduced performance was observed with an abbreviated feature set of 5 variables (0.79, 80%, 80%, respectively). The full model demonstrated good performance (AUC 0.76) to predict NASH in Optum data. Discussion The proposed model, named NASHmap, is the first ML model developed with confirmed NASH and non-NASH cases as determined through liver biopsy and validated on a large, real-world patient dataset. Both the 14 and 5-feature versions exhibit high performance. Conclusion The NASHmap model is a convenient and high performing tool that could be used to identify patients likely to have NASH in clinical settings, allowing better patient management and optimal allocation of clinical resources.
ISSN:	1527-974X 1067-5027 1527-974X
DOI:	10.1093/jamia/ocab003