ODBOT: Outlier detection-based oversampling technique for imbalanced datasets learning

In many real-world problems, the datasets are imbalanced when the samples of majority classes are much greater than the samples of minority classes. In general, machine learning and data mining classification algorithms perform poorly on imbalanced datasets. In recent years, various oversampling tec...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Neural computing & applications 2021-11, Vol.33 (22), p.15781-15806
1. Verfasser: IBRAHIM, Mohammed H.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In many real-world problems, the datasets are imbalanced when the samples of majority classes are much greater than the samples of minority classes. In general, machine learning and data mining classification algorithms perform poorly on imbalanced datasets. In recent years, various oversampling techniques have been developed in the literature to solve the class imbalance problem. Unfortunately, few of the oversampling techniques can be spread to tackle the relationship between the classes and use the correlation between attributes. Moreover, in most cases, the existing oversampling techniques do not handle multi-class imbalanced datasets. To this end, in this paper, a simple but effective outlier detection-based oversampling technique (ODBOT) is proposed to handle the multi-class imbalance problem. In the proposed ODBOT, the outlier samples are detected by clustering within the minority class(es), and then, the synthetic samples are generated by consideration of these outlier samples. The proposed ODBOT generates very efficient and consistent synthetic samples for the minority class(es) by analyzing well the dissimilarity relationships among attribute values of all classes. Moreover, ODBOT can reduce the risk of the overlapping problem among different class regions and can build a better classification model. The performance of the proposed ODBOT is evaluated with extensive experiments using commonly used 60 imbalanced datasets and five classification algorithms. The experimental results show that the proposed ODBOT oversampling technique consistently outperformed the other common and state-of-the-art techniques in terms of various evaluation criteria.
ISSN:0941-0643
1433-3058
DOI:10.1007/s00521-021-06198-x