Improving imbalanced classification using near-miss instances

The class imbalance is a major issue in classification, i.e., the sample size of a rare class (positive) is often a performance bottleneck. In real-world situations, however, “near-miss” positive instances, i.e., negative but nearly-positive instances, are sometimes plentiful. For example, natural d...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Expert systems with applications 2022-09, Vol.201, p.117130, Article 117130
Hauptverfasser: Tanimoto, Akira, Yamada, So, Takenouchi, Takashi, Sugiyama, Masashi, Kashima, Hisashi
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The class imbalance is a major issue in classification, i.e., the sample size of a rare class (positive) is often a performance bottleneck. In real-world situations, however, “near-miss” positive instances, i.e., negative but nearly-positive instances, are sometimes plentiful. For example, natural disasters such as floods are rare, while there are relatively plentiful near-miss cases where actual floods did not occur but the water level approached the bank height. We show that even when the true positive cases are quite limited, such as in disaster forecasting, the accuracy can be improved by obtaining refined label-like side-information “positivity” (e.g., the water level of the river) to distinguish near-miss cases from other negatives. Conventional cost-sensitive classification cannot utilize such side-information, and the small size of the positive sample causes high estimation variance. Our approach is in line with learning using privileged information (LUPI), which exploits side-information for training without predicting the side-information itself. We theoretically prove that our method reduces the estimation variance, provided that near-miss positive instances are plentiful, in exchange for additional bias. Results of extensive experiments demonstrate that our method tends to outperform or compares favorably to existing approaches. •Class-imbalanced classification can be improved by utilizing ‘near-miss’ instances.•Side information ‘positivity’ is assumed for each instance that specifies near-miss.•Treating near-misses as being partly positive reduces the estimation variance.•Non-asymptotic bound and extensive experiments shows the superior performance.
ISSN:0957-4174
1873-6793
DOI:10.1016/j.eswa.2022.117130