Machine learning-based prediction of diabetic patients using blood routine data

•A computational frame was proposed to predict diabetes that collected from hospitals and health center by leveraging blood routine data.•The contributions of different blood routine indicators for diabetes were identified by our frame.•A nomogram was constructed for assessing the influence of blood...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Methods (San Diego, Calif.) Calif.), 2024-09, Vol.229, p.156-162
Hauptverfasser: Li, Honghao, Su, Dongqing, Zhang, Xinpeng, He, Yuanyuan, Luo, Xu, Xiong, Yuqiang, Zou, Min, Wei, Huiyan, Wen, Shaoran, Xi, Qilemuge, Zuo, Yongchun, Yang, Lei
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:•A computational frame was proposed to predict diabetes that collected from hospitals and health center by leveraging blood routine data.•The contributions of different blood routine indicators for diabetes were identified by our frame.•A nomogram was constructed for assessing the influence of blood routine indicators on prediction outcomes. Diabetes stands as one of the most prevalent chronic diseases globally. The conventional methods for diagnosing diabetes are frequently overlooked until individuals manifest noticeable symptoms of the condition. This study aimed to address this gap by collecting comprehensive datasets, including 1000 instances of blood routine data from diabetes patients and an equivalent dataset from healthy individuals. To differentiate diabetes patients from their healthy counterparts, a computational framework was established, encompassing eXtreme Gradient Boosting (XGBoost), random forest, support vector machine, and elastic net algorithms. Notably, the XGBoost model emerged as the most effective, exhibiting superior predictive results with an area under the receiver operating characteristic curve (AUC) of 99.90% in the training set and 98.51% in the testing set. Moreover, the model showcased commendable performance during external validation, achieving an overall accuracy of 81.54%. The probability generated by the model serves as a risk score for diabetes susceptibility. Further interpretability was achieved through the utilization of the Shapley additive explanations (SHAP) algorithm, identifying pivotal indicators such as mean corpuscular hemoglobin concentration (MCHC), lymphocyte ratio (LY%), standard deviation of red blood cell distribution width (RDW-SD), and mean corpuscular hemoglobin (MCH). This enhances our understanding of the predictive mechanisms underlying diabetes. To facilitate the application in clinical and real-life settings, a nomogram was created based on the logistic regression algorithm, which can provide a preliminary assessment of the likelihood of an individual having diabetes. Overall, this research contributes valuable insights into the predictive modeling of diabetes, offering potential applications in clinical practice for more effective and timely diagnoses.
ISSN:1046-2023
1095-9130
1095-9130
DOI:10.1016/j.ymeth.2024.07.001