Augmenting the diversity of imbalanced datasets via multi-vector stochastic exploration oversampling
The problem of class imbalance is prevalent in many real-world data sets, causing learning models to skew towards the majority class and resulting in biased performance. Data augmentation methods, such as the well-known Synthetic Minority Over-sampling Technique (SMOTE), are commonly employed to add...
Gespeichert in:
Veröffentlicht in: | Neurocomputing (Amsterdam) 2024-05, Vol.583, p.127600, Article 127600 |
---|---|
Hauptverfasser: | , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The problem of class imbalance is prevalent in many real-world data sets, causing learning models to skew towards the majority class and resulting in biased performance. Data augmentation methods, such as the well-known Synthetic Minority Over-sampling Technique (SMOTE), are commonly employed to address class imbalance by generating synthetic samples. However, the generation mechanism of SMOTE is relatively constrained resulting in insufficient diversity in synthetic samples. To overcome this limitation, this paper expands the classical SMOTE and introduces a novel generalized version, namely Multi-vector Stochastic Exploration Oversampling (MSEO). It broadens the set of mapping synthetic samples, originally formed by the determined direction vectors and scaling vectors through the neighboring samples, to a collection obtained through mappings with random direction vectors and scaling vectors. This allows the generated samples to escape the original linear interpolation region, facilitating a more flexible exploration of the sample space. We extensively evaluated the method on various types of datasets, including artificially generated datasets, multi-class real-world datasets, and the engineering dataset. The results indicate that MSEO exhibits significant advantages in enhancing classification performance and promoting diversity in synthetic samples. |
---|---|
ISSN: | 0925-2312 1872-8286 |
DOI: | 10.1016/j.neucom.2024.127600 |