Improving classification rate constrained to imbalanced data between overlapped and non-overlapped regions by hybrid algorithms
A new aspect of imbalanced data classification was studied. Unlike the classical imbalanced data classification where the cause of problem is due to the difference of data sizes, our study concerns only the situation when there exists an overlap between two classes. When one class overlaps another c...
Gespeichert in:
Veröffentlicht in: | Neurocomputing (Amsterdam) 2015-03, Vol.152, p.429-443 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | A new aspect of imbalanced data classification was studied. Unlike the classical imbalanced data classification where the cause of problem is due to the difference of data sizes, our study concerns only the situation when there exists an overlap between two classes. When one class overlaps another class, there are three regions induced from the overlap. The first region is the overlapped region between two classes. The rest is the non-overlapped region of each class. The imbalance situation is obviously caused by the different amount of data at the overlapped region and non-overlapped region. In this situation, the difference of data sizes from different classes is not the main concern and has no effect on the accuracy of classification. In this research, a combined technique, called Soft-Hybrid algorithm, was proposed for improving classification performance. The technique was divided into two main phases: boundary region determination and responsive classification algorithms for each sub-area. In the first phase, data were grouped as (1) non-overlapping data, (2) borderline data, and (3) overlapping data. Learning data using modified Hausdorff Distance, Radial Basis Function Network and K-Means clustering technique with Mahalanobis Distance. Then, modified Kernel Learning Method, modified DBSCAN and RBF network were applied to classify the data into proper groups based on statistical values from the classification phase. Finally, the results of all techniques were combined. The experimental results illustrated that the proposed method can significantly improve the effectiveness in classifying imbalanced data having large overlapping sections based on TP rate, F-measure and G-mean measures. Moreover, the computational times of the proposed method were lower than the standard algorithms used for this type of this problem. |
---|---|
ISSN: | 0925-2312 1872-8286 |
DOI: | 10.1016/j.neucom.2014.10.007 |