A Classification Model For Class Imbalance Dataset Using Genetic Programming
Since the last few decades, a class imbalance has been one of the most challenging problems in various fields, such as data mining and machine learning. The particular state of an imbalanced dataset, where each class associated with a given dataset is distributed unevenly. This happens when the posi...
Gespeichert in:
Veröffentlicht in: | IEEE access 2019, Vol.7, p.71013-71037 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Since the last few decades, a class imbalance has been one of the most challenging problems in various fields, such as data mining and machine learning. The particular state of an imbalanced dataset, where each class associated with a given dataset is distributed unevenly. This happens when the positive class is much smaller than the negative class. In this case, most standard classification algorithms do not identify examples related to the positive class. A positive class usually refers to the key interest of the classification task. In order to solve this problem, several solutions were proposed such as sampling-based over-sampling and under-sampling, changes at the classifier level or the combination of two or more classifiers. However the main problem is that most solutions are biased towards negative class, computationally expensive, have storage issues or taking long training time. An alternative approach to this problem is the genetic algorithm (GA), which has shown the promising results. The GA is an evolutionary learning algorithm that uses the principles of Darwinian evolution, it is a powerful global search algorithm. Moreover, the fitness function is a key parameter in GA. It determines how well a solution can solve the given problem. In this paper, we propose a solution which uses entropy and information gain as a fitness function in GA with an objective to improve the impurity and gives a more balanced result without changing the original dataset. The experiments conducted on different datasets demonstrate the effectiveness of the proposed solution in comparison with the several other state-of-the-art algorithms in term of Accuracy (Acc), geometric mean (GM), F-measure (FM), kappa, and Matthews correlation coefficient (MCC). |
---|---|
ISSN: | 2169-3536 2169-3536 |
DOI: | 10.1109/ACCESS.2019.2915611 |