DEBOHID: A differential evolution based oversampling approach for highly imbalanced datasets
•A novel oversampling method based on a DEBOHID is presented.•SVM, k-NN, and DT are used as a classifier.•The independence of the experimental results to the classifier is showed.•AUC and G-Mean are used as performance metrics for determining the performance.•The experiments have shown the superiori...
Gespeichert in:
Veröffentlicht in: | Expert systems with applications 2021-05, Vol.169, p.114482, Article 114482 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | •A novel oversampling method based on a DEBOHID is presented.•SVM, k-NN, and DT are used as a classifier.•The independence of the experimental results to the classifier is showed.•AUC and G-Mean are used as performance metrics for determining the performance.•The experiments have shown the superiority of DEBOHID for rare events detection.
Class distribution of the samples in the dataset is one of the critical factors affecting the classification success. Classifiers trained with imbalanced datasets classify majority class samples more successfully than minority class samples. Oversampling, which is based on increasing the minority class samples, is a frequently used method to overcome the class imbalance. More than two decades, many oversampling methods are presented for the class imbalance problem. Differential Evolution is a metaheuristic algorithm that achieves successful results in a lot of domains. One of the main reasons for this success is that DE has an effective candidate individual generation mechanism. In this work, we propose a novel oversampling method based on a differential evolution algorithm for highly imbalanced datasets, and it is named as DEBOHID (A differential evolution based oversampling approach for highly imbalanced datasets). In order to show the success of DEBOHID, 44 highly imbalanced ratio datasets are used in experiments. The obtained results are compared with nine different state-of-art oversampling methods. In order to show the independence of the experimental results to classifier, Support Vector Machines (SVM), k-Nearest Neighbor (kNN), and Decision Tree (DT) are used as a classifier in the experiments. AUC and G-Mean metrics are used for the performance measurements. The experimental results and statistical analyses have shown the triumph of the DEBOHID. |
---|---|
ISSN: | 0957-4174 1873-6793 |
DOI: | 10.1016/j.eswa.2020.114482 |