Effects of Class Imbalance Using Machine Learning Algorithms: Case Study Approach

Class imbalance is the major hurdle for machine learning-based systems. Data set is the backbone of machine learning and must be studied to handle the class imbalance. The purpose of this paper is to investigate the effect of class imbalance on the data sets. The proposed methodology determines the...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:International journal of applied evolutionary computation 2021-01, Vol.12 (1), p.1-17
Hauptverfasser: Narwane, Swati V, Sawarkar, Sudhir D
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Class imbalance is the major hurdle for machine learning-based systems. Data set is the backbone of machine learning and must be studied to handle the class imbalance. The purpose of this paper is to investigate the effect of class imbalance on the data sets. The proposed methodology determines the model accuracy for class distribution. To find possible solutions, the behaviour of an imbalanced data set was investigated. The study considers two case studies with data set divided balanced to unbalanced class distribution. Testing of the data set with trained and test data was carried out for standard machine learning algorithms. Model accuracy for class distribution was measured with the training data set. Further, the built model was tested with individual binary class. Results show that, for the improvement of the system performance, it is essential to work on class imbalance problems. The study concludes that the system produces biased results due to the majority class. In the future, the multiclass imbalance problem can be studied using advanced algorithms.
ISSN:1942-3594
1942-3608
DOI:10.4018/IJAEC.2021010101