Affinity and class probability-based fuzzy support vector machine for imbalanced data sets

The learning problem from imbalanced data sets poses a major challenge in data mining community. Although conventional support vector machine can generally show relatively robust performance in dealing with the classification problems of imbalanced data sets, it treats all training samples with the...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Neural networks 2020-02, Vol.122, p.289-307
Hauptverfasser: Tao, Xinmin, Li, Qing, Ren, Chao, Guo, Wenjie, He, Qing, Liu, Rui, Zou, Junrong
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The learning problem from imbalanced data sets poses a major challenge in data mining community. Although conventional support vector machine can generally show relatively robust performance in dealing with the classification problems of imbalanced data sets, it treats all training samples with the same contribution for learning, which results in the final decision boundary biasing toward the majority class especially in the presence of outliers or noises. In this paper, we propose a new affinity and class probability-based fuzzy support vector machine technique (ACFSVM). The affinity of a majority class sample is calculated according to support vector description domain (SVDD) model trained only by the given majority class training samples in kernel space similar to that used for FSVM learning. The obtained affinity can be used for identifying possible outliers and some border samples existing in the majority class training samples. In order to eliminate the effect of noises, we employ the kernel k-nearest neighbor method to determine the class probability of the majority class samples in the same kernel space as before. The samples with lower class probabilities are more likely to be noises and their contribution for learning seems to be reduced by their low memberships constructed by combining the affinities and the class probabilities. Thus, ACFSVM can pay more attention to the majority class samples with higher affinities and class probabilities while reducing their effects of the ones with lower affinities and class probabilities, eventually skewing the final classification boundary toward the majority class. In addition, the minority class samples are assigned relative high memberships to guarantee their importance for the model learning. The extensive experimental results on the different imbalanced datasets from UCI repository demonstrate that the proposed approach can achieve better generalization performance in terms of G-Mean, F-Measure, and AUC as compared to the other existing imbalanced dataset classification techniques. •ACFSVM focuses on the majority class with higher affinities and class probabilities.•ACFSVM can skew the final classification boundary toward the majority class.•Affinity based on SVDD and class probability based on kernelknn are used in ACFSVM.•Minority class samples are assigned high memberships to guarantee their importance.•ACFSVM achieves better generalization performance in imbalanced classification.
ISSN:0893-6080
1879-2782
DOI:10.1016/j.neunet.2019.10.016