507-P: Influence of Past Information to Precision of Diabetic Nephropathy Aggravation Prediction

Background: In the presence of large data set of electronic health records (EHRs), predicting the future disease status is of importance for decision making in the medical treatments. Using modern machine learning techniques, it is generally becoming easier to build complex models to predict the fut...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Diabetes (New York, N.Y.) N.Y.), 2019-06, Vol.68 (Supplement_1)
Hauptverfasser: KOSEKI, AKIRA, ONO, MASAKI, KUDO, MICHIHARU, HAIDA, KYOICHI, MAKINO, MASAKI, SUZUKI, ATSUSHI
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Background: In the presence of large data set of electronic health records (EHRs), predicting the future disease status is of importance for decision making in the medical treatments. Using modern machine learning techniques, it is generally becoming easier to build complex models to predict the future. For those models, a set of past information are used to make explanatory variables, however, we don't have enough knowledge as to how long we should collect data backward. In some cases, very late tendencies are influencing the future status of disease while in the other cases, old events were the importance causes of the change of the disease status. Our interest thus lies in how old data we have to process to make the good prediction models. Method: In this paper, we discuss a set of machine learning algorithms to predict the diabetic nephropathy stage in the future using sets of input variables which were collected from different time span of past records. To compare the performance of algorithms we used Logistic Regression, AdaBoost, Gradient Boosting, Decision tree, Multi-layer Perceptron, and Random Forest. We then provide different set of variables of EHR that include past 30-, 60-, 90-, 180-, 210-, 240-, 270-, 300-, 330-, and 360-day data sets, from which we extracted several longitudinal statistics for input variables. From about 65 thousand type 2 diabetes patients, the models classify whether the nephropathy stage gets aggravated or stay in 180 days. Results: For almost all algorithms, AUC is getting improved when using older data, and 360-day data sets gave the best. Among the algorithms, Gradient Boosting gave the best AUC of 0.77 when using 360-day data set. When using 360-day data sets, Decision Tree gave worst AUC of 0.61. Conclusion: We observed that when using to past data up to 360 days, the oldest data set gave the best prediction performance. Longitudinal statistics in rather long span gives good explanatory information for future nephropathy development.
ISSN:0012-1797
1939-327X
DOI:10.2337/db19-507-P