Augmenting the diversity of imbalanced datasets via multi-vector stochastic exploration oversampling

The problem of class imbalance is prevalent in many real-world data sets, causing learning models to skew towards the majority class and resulting in biased performance. Data augmentation methods, such as the well-known Synthetic Minority Over-sampling Technique (SMOTE), are commonly employed to add...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Neurocomputing (Amsterdam) 2024-05, Vol.583, p.127600, Article 127600
Hauptverfasser: Li, Hongrui, Wang, Shuangxin, Jiang, Jiading, Deng, Chuiyi, Ou, Junmei, Zhou, Ziang, Yu, Dingli
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The problem of class imbalance is prevalent in many real-world data sets, causing learning models to skew towards the majority class and resulting in biased performance. Data augmentation methods, such as the well-known Synthetic Minority Over-sampling Technique (SMOTE), are commonly employed to address class imbalance by generating synthetic samples. However, the generation mechanism of SMOTE is relatively constrained resulting in insufficient diversity in synthetic samples. To overcome this limitation, this paper expands the classical SMOTE and introduces a novel generalized version, namely Multi-vector Stochastic Exploration Oversampling (MSEO). It broadens the set of mapping synthetic samples, originally formed by the determined direction vectors and scaling vectors through the neighboring samples, to a collection obtained through mappings with random direction vectors and scaling vectors. This allows the generated samples to escape the original linear interpolation region, facilitating a more flexible exploration of the sample space. We extensively evaluated the method on various types of datasets, including artificially generated datasets, multi-class real-world datasets, and the engineering dataset. The results indicate that MSEO exhibits significant advantages in enhancing classification performance and promoting diversity in synthetic samples.
ISSN:0925-2312
1872-8286
DOI:10.1016/j.neucom.2024.127600