An effective method using clustering-based adaptive decomposition and editing-based diversified oversamping for multi-class imbalanced datasets
For multi-class imbalanced classification tasks that occur in many real-world applications, the class imbalance, which is caused by the case that some classes are not as frequent as other classes, and class overlap, which is caused by the case that some classes contains a similar number of data, are...
Gespeichert in:
Veröffentlicht in: | Applied intelligence (Dordrecht, Netherlands) Netherlands), 2021-04, Vol.51 (4), p.1918-1933 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | For multi-class imbalanced classification tasks that occur in many real-world applications, the class imbalance, which is caused by the case that some classes are not as frequent as other classes, and class overlap, which is caused by the case that some classes contains a similar number of data, are the major challenges. Both of them make the classification task complicated. The decomposition-based strategy is an effective way to improve the performance of multi-class imbalanced classification tasks. However, current studies based on this strategy have failed to solve the problems of class imbalance and overlapping simultaneously. Therefore, we propose an effective method , namely clustering-based adaptive decomposition and editing-based diversified oversamping procedure(CluAD-EdiDO), to solve the above problems in this paper. The proposed CluAD-EdiDO consists of two key components: the clustering-based adaptive decomposition and the editing-based diversified oversampling technique. The former is applied to group similar data samples of the data set into clusters(i.e., “sub-problems”). The latter is applied independently in different clusters to combat the imbalance and overlap, reducing the impact of the majority classes in overlapping region and oversampling the minority classes appropriately. Furthermore, a diversified ensemble learning framework is adopted to select the best classification algorithm for different sub-problems. Extensive experiments on 17 real-world datasets demonstrate that our method outperforms for multi-class imbalanced datasets. |
---|---|
ISSN: | 0924-669X 1573-7497 |
DOI: | 10.1007/s10489-020-01883-1 |