A systematic review for class-imbalance in semi-supervised learning

This review aims to examine the state of the art of semi-supervised learning (SSL) techniques for addressing class imbalanced data. Class imbalance is inherent in many real-world applications and has been extensively investigated in supervised classification. In a semi-supervised scenario, this prob...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:The Artificial intelligence review 2023-11, Vol.56 (Suppl 2), p.2349-2382
Hauptverfasser: de Oliveira, Willian Dihanster Gomes, Berton, Lilian
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:This review aims to examine the state of the art of semi-supervised learning (SSL) techniques for addressing class imbalanced data. Class imbalance is inherent in many real-world applications and has been extensively investigated in supervised classification. In a semi-supervised scenario, this problem is even more interesting because of two possible situations: performance is affected and the error is propagated to the unlabeled data, worsening the final performance, or unlabeled data can help to represent the minority class and improve the results. However, as far as we know, no survey exists organizing the semi-supervised approaches to deal with class imbalance. Our goal is to fill this gap and present a systematic review, where we retrieved 444 articles from five years (2017–2021) from ACM Digital Library, IEEE Explore, Elsevier, Springer, and Google Scholar. After applying exclusion criteria, 47 articles were selected and presented in more detail. We collect important information to answer four research questions, such as the existence of pre/post-processing techniques, the applications, data sets explored, the metrics used to evaluate the approaches, and the developed techniques to deal with class imbalance. We propose eight categories (balancing, graph-based, loss, self-training, ensemble, active learning, post-processing, and other types of learning) to organize the different methodological approaches from the papers. Finally, we present some discussion and future trends in the area. Our review aims to provide an understanding of the most prominent and currently relevant work employing SSL for class imbalance.
ISSN:0269-2821
1573-7462
DOI:10.1007/s10462-023-10579-0