Machine learning-based reproducible prediction of type 2 diabetes subtypes

Aims/hypothesis Clustering-based subclassification of type 2 diabetes, which reflects pathophysiology and genetic predisposition, is a promising approach for providing personalised and effective therapeutic strategies. Ahlqvist’s classification is currently the most vigorously validated method becau...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Diabetologia 2024-11, Vol.67 (11), p.2446-2458
Hauptverfasser: Tanabe, Hayato, Sato, Masahiro, Miyake, Akimitsu, Shimajiri, Yoshinori, Ojima, Takafumi, Narita, Akira, Saito, Haruka, Tanaka, Kenichi, Masuzaki, Hiroaki, Kazama, Junichiro J., Katagiri, Hideki, Tamiya, Gen, Kawakami, Eiryo, Shimabukuro, Michio
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Aims/hypothesis Clustering-based subclassification of type 2 diabetes, which reflects pathophysiology and genetic predisposition, is a promising approach for providing personalised and effective therapeutic strategies. Ahlqvist’s classification is currently the most vigorously validated method because of its superior ability to predict diabetes complications but it does not have strong consistency over time and requires HOMA2 indices, which are not routinely available in clinical practice and standard cohort studies. We developed a machine learning (ML) model to classify individuals with type 2 diabetes into Ahlqvist’s subtypes consistently over time. Methods Cohort 1 dataset comprised 619 Japanese individuals with type 2 diabetes who were divided into training and test sets for ML models in a 7:3 ratio. Cohort 2 dataset, comprising 597 individuals with type 2 diabetes, was used for external validation. Participants were pre-labelled (T2D kmeans ) by unsupervised k -means clustering based on Ahlqvist’s variables (age at diagnosis, BMI, HbA 1c , HOMA2-B and HOMA2-IR) to four subtypes: severe insulin-deficient diabetes (SIDD), severe insulin-resistant diabetes (SIRD), mild obesity-related diabetes (MOD) and mild age-related diabetes (MARD). We adopted 15 variables for a multiclass classification random forest (RF) algorithm to predict type 2 diabetes subtypes (T2D RF15 ). The proximity matrix computed by RF was visualised using a uniform manifold approximation and projection. Finally, we used a putative subset with missing insulin-related variables to test the predictive performance of the validation cohort, consistency of subtypes over time and prediction ability of diabetes complications. Results T2D RF15 demonstrated a 94% accuracy for predicting T2D kmeans type 2 diabetes subtypes (AUCs ≥0.99 and F1 score [an indicator calculated by harmonic mean from precision and recall] ≥0.9) and retained the predictive performance in the external validation cohort (86.3%). T2D RF15 showed an accuracy of 82.9% for detecting T2D kmeans , also in a putative subset with missing insulin-related variables, when used with an imputation algorithm. In Kaplan–Meier analysis, the diabetes clusters of T2D RF15 demonstrated distinct accumulation risks of diabetic retinopathy in SIDD and that of chronic kidney disease in SIRD during a median observation period of 11.6 (4.5–18.3) years, similarly to the subtypes using T2D kmeans . The predictive accuracy was improved after excludin
ISSN:0012-186X
1432-0428
1432-0428
DOI:10.1007/s00125-024-06248-8