Is handling unbalanced datasets for machine learning uplifts system performance?: A case of diabetic prediction

BACKGROUND AND AIMSHealthcare is a sensitive sector, and addressing the class imbalance in the healthcare domain is a time-consuming task for machine learning-based systems due to the vast amount of data. This study looks into the impact of socioeconomic disparities on the healthcare data of diabeti...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Diabetes & metabolic syndrome clinical research & reviews 2022-09, Vol.16 (9), p.102609-102609, Article 102609
Hauptverfasser:	Narwane, Swati V., Sawarkar, Sudhir D.
Format:	Artikel
Sprache:	eng
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	BACKGROUND AND AIMSHealthcare is a sensitive sector, and addressing the class imbalance in the healthcare domain is a time-consuming task for machine learning-based systems due to the vast amount of data. This study looks into the impact of socioeconomic disparities on the healthcare data of diabetic patients to make accurate disease predictions. METHODSThis study proposed a systematic approach of Closest Distance Ranking and Principal Component Analysis to deal with the unbalanced dataset. A typical machine learning technique was used to analyze the proposed approach. The data set of pregnant diabetic women is analysed for accurate detection. RESULTSThe results of the case are analysed using sensitivity, which demonstrates that the minority class's lack of information makes it impossible to forecast the results. On the other hand, the unbalanced dataset was treated using the proposed technique and evaluated with the machine learning algorithm which significantly increased the performance of the system. CONCLUSIONThe performance of the machine learning-based system was significantly enhanced by the unbalanced dataset which was processed with the proposed technique and evaluated with the machine learning algorithm. For the first time, an unbalanced dataset was treated with a combination of Closest Distance Ranking and Principal Component Analysis.
ISSN:	1871-4021 1878-0334
DOI:	10.1016/j.dsx.2022.102609