Diabetes disease detection and classification on Indian demographic and health survey data using machine learning methods

Diabetes mellitus has become one of the out brakes causing major health issues in developing countries like India. The need for leveraging technology is felt in diabetes management. The main objective of this work is to deploy machine learning methods for the detection and classification of diabetes...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Diabetes & metabolic syndrome clinical research & reviews 2023-01, Vol.17 (1), p.102690-102690, Article 102690
Hauptverfasser:	Thotad, Puneeth N., Bharamagoudar, Geeta R., Anami, Basavaraj S.
Format:	Artikel
Sprache:	eng
Schlagworte:	Bayes Theorem Classifiers Demography Diabetes Diabetes Mellitus - diagnosis Humans Kernel entropy component analysis Machine Learning Risk Factors Supervised learning
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Diabetes mellitus has become one of the out brakes causing major health issues in developing countries like India. The need for leveraging technology is felt in diabetes management. The main objective of this work is to deploy machine learning methods for the detection and classification of diabetes having clinical relevance. Indian demographic and health survey-2016 dataset is considered and determined the risk factors for continuous and categorical data. Kernel entropy component analysis is used for the dimensionality reduction of the feature set. Predictive exploration-based machine learning methods like logistic regression, gaussian naive Bayes, linear discriminant analysis, support vector classifier, k-nearest neighbor, decision tree, extreme gradient boosting, kernel entropy component analysis, and random forest are deployed in the work. The deployed methodology has three phases: feature extraction, classification, and prediction. Random Forest gave the maximum classification accuracy of 99.84% and 96.75% for imbalanced and kernel entropy component analysis-induced balanced datasets (using synthetic minority oversampling technique) respectively. The maximum precision of 99.64% is obtained using a support vector classifier on the balanced dataset. The area under the curve is 99%, which is observed from kernel entropy component analysis induced random forest on the balanced dataset. All other models performed moderately when applied to kernel entropy component analysis trained dataset. Random Forest model performed better in comparison with other models. The overall performance of the machine learning models can be improved by training the diabetes dataset using kernel entropy component analysis. •Seven out of twelve attributes are strongly contributing to diabetes among Indians.•Kernel entropy component analysis used to classify traditional and non-traditional risk factors.•Machine learning models are applied, and Random Forest gave the maximum classification accuracy.
ISSN:	1871-4021 1878-0334
DOI:	10.1016/j.dsx.2022.102690