Wasserstein Distributionally Robust Multiclass Support Vector Machine

We study the problem of multiclass classification for settings where data features \(\mathbf{x}\) and their labels \(\mathbf{y}\) are uncertain. We identify that distributionally robust one-vs-all (OVA) classifiers often struggle in settings with imbalanced data. To address this issue, we use Wasser...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2024-09
Hauptverfasser:	Ibrahim, Michael, Rozas, Heraldo, Gebraeel, Nagi
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Labels Robustness Support vector machines Upper bounds
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	We study the problem of multiclass classification for settings where data features \(\mathbf{x}\) and their labels \(\mathbf{y}\) are uncertain. We identify that distributionally robust one-vs-all (OVA) classifiers often struggle in settings with imbalanced data. To address this issue, we use Wasserstein distributionally robust optimization to develop a robust version of the multiclass support vector machine (SVM) characterized by the Crammer-Singer (CS) loss. First, we prove that the CS loss is bounded from above by a Lipschitz continuous function for all \(\mathbf{x} \in \mathcal{X}\) and \(\mathbf{y} \in \mathcal{Y}\), then we exploit strong duality results to express the dual of the worst-case risk problem, and we show that the worst-case risk minimization problem admits a tractable convex reformulation due to the regularity of the CS loss. Moreover, we develop a kernel version of our proposed model to account for nonlinear class separation, and we show that it admits a tractable convex upper bound. We also propose a projected subgradient method algorithm for a special case of our proposed linear model to improve scalability. Our numerical experiments demonstrate that our model outperforms state-of-the art OVA models in settings where the training data is highly imbalanced. We also show through experiments on popular real-world datasets that our proposed model often outperforms its regularized counterpart as the first accounts for uncertain labels unlike the latter.
ISSN:	2331-8422