Generalizability of machine learning models for diabetes detection a study with nordic islet transplant and PIMA datasets
Diabetes Mellitus (DM) is a global health challenge, and accurate early detection is critical for effective management. The study explores the potential of machine learning for improved diabetes prediction using microarray gene expression data and PIMA data set. Researchers utilizing a hybrid featur...
Gespeichert in:
Veröffentlicht in: | Scientific reports 2025-02, Vol.15 (1), p.4479-27, Article 4479 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Diabetes Mellitus (DM) is a global health challenge, and accurate early detection is critical for effective management. The study explores the potential of machine learning for improved diabetes prediction using microarray gene expression data and PIMA data set. Researchers utilizing a hybrid feature extraction method such as Artificial Bee Colony (ABC) and Particle Swarm Optimization (PSO) followed by metaheuristic feature selection algorithms as Harmonic Search (HS), Dragonfly Algorithm (DFA), Elephant Herding Algorithm (EHA). Evaluated the performance of a system by using the following classifiers as Non-Linear Regression—NLR, Linear Regression—LR, Gaussian Mixture Model—GMM, Expectation Maximization—EM, Bayesian Linear Discriminant Analysis—BLDA, Softmax Discriminant Classifier—SDC, and Support Vector Machine with Radial Basis Function kernel—SVM-RBF classifier on two publicly available datasets namely the Nordic Islet Transplant Program (NITP) and the PIMA Indian Diabetes Dataset (PIDD). The findings demonstrate significant improvement in classification accuracy compared to using all genes. On the Nordic islet transplant dataset, the combined ABC-PSO feature extraction with EHO feature selection achieved the highest accuracy of 97.14%, surpassing the 94.28% accuracy obtained with ABC alone and EHO selection. Similarly, on the PIMA Indian diabetes dataset, the ABC-PSO and EHO combination achieved the best accuracy of 98.13%, exceeding the 95.45% accuracy with ABC and DFA selection. These results highlight the effectiveness of our proposed approach in identifying the most informative features for accurate diabetes prediction. It is observed that the parametric values attained for the datasets are almost similar. Therefore, this research indicates the robustness of the FE and FS along with classifier techniques with two different datasets. |
---|---|
ISSN: | 2045-2322 2045-2322 |
DOI: | 10.1038/s41598-025-87471-0 |