Geometric SMOTE for regression

Learning from imbalanced data sets is known to be a challenging task. There are many proposals to tackle the challenge for classification problems, but regarding regression the solutions are few. In the context of regression, imbalanced learning means that there is a concern with the accurate predic...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Expert systems with applications 2022-05, Vol.193, p.116387, Article 116387
Hauptverfasser:	Camacho, Luís, Douzas, Georgios, Bacao, Fernando
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Classification Continuity (mathematics) Data-level Datasets Imbalanced Learning Regression
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Learning from imbalanced data sets is known to be a challenging task. There are many proposals to tackle the challenge for classification problems, but regarding regression the solutions are few. In the context of regression, imbalanced learning means that there is a concern with the accurate prediction of the target values in a subset of the continuous target variable, considering that these values rarely occur in the data set. In this article, we extend the G-SMOTE algorithm that is used in classification to regression tasks. G-SMOTE is a pre-processing algorithm that differs from the SMOTE algorithm as it allows the generation of synthetic instances in a geometric region around the selected instances rather than in the line segment that joins the two selected instances. The performance of G-SMOTE for regression was compared against other methods, and the empirical results show that our proposal outperformed those methods. •Imbalanced learning is a well-known problem in classification.•Regression problems can also be affected by the imbalanced nature of data.•We propose a method to mitigate the imbalanced regression problem.•Improves the prediction of rare and extreme values of a continuous target variable.
ISSN:	0957-4174 1873-6793
DOI:	10.1016/j.eswa.2021.116387