SMOTE-SMO-based expert system for type II diabetes detection using PIMA dataset

Background Medical data, which is critical to human existence, is used to identify potential people prone to any specific complication or disease by the application of appropriate data mining (DM) techniques. DM is specifically applied to extract details for diagnosis, prediction, prevention, and tr...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:International journal of diabetes in developing countries 2022-04, Vol.42 (2), p.245-253
Hauptverfasser: Naz, Huma, Ahuja, Sachin
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Background Medical data, which is critical to human existence, is used to identify potential people prone to any specific complication or disease by the application of appropriate data mining (DM) techniques. DM is specifically applied to extract details for diagnosis, prediction, prevention, and treatment of various diseases. According to the International Diabetes Federation (IDF) 2019 atlas report, diabetes caused 4.2 million deaths over the globe, and hence, it is critical to diagnose diabetes at an early stage. Material and method Even though many techniques are available to diagnose diabetes, the methods are not efficient to find hidden patterns with the desired accuracy for correct decision-making. Thus, this paper presents an integrated approach of synthetic minority oversampling technique (SMOTE) and sequential minimal optimization (SMO) algorithms for predicting diabetes. In this proposed two-phase classification model, the first step is pre-processing of data using the SMOTE algorithm, and the second step is SMO classifier. The output of the pre-processing is given to SMO to increase the performance of the classifier. Result This classification model achieved an accuracy rate of 99.07% on the PIMA Indian diabetes dataset (PIDD) using our proposed approach. PIDD has been taken from UCI repository for this proposed work; however, the National Institute of Diabetes and digestive kidney disease owned the PIDD. The dataset contains 768 female patients, details each with 8 numeric and one decision class attribute. Conclusion The output of the study confirms that the proposed integrated approach of DM could be used as an expert system for diagnosing diabetes in patients at an early stage. The extracted features from this study will be used for the development of a prognostic tool in the form of a mobile application for early diabetes detection.
ISSN:0973-3930
1998-3832
DOI:10.1007/s13410-021-00969-x