Data-Dependent Generalization Bounds for Multi-Class Classification

In this paper, we study data-dependent generalization error bounds that exhibit a mild dependency on the number of classes, making them suitable for multi-class learning with a large number of label classes. The bounds generally hold for empirical multi-class risk minimization algorithms using an ar...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on information theory 2019-05, Vol.65 (5), p.2995-3021
Hauptverfasser:	Lei, Yunwen, Dogan, Urun, Zhou, Ding-Xuan, Kloft, Marius
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Complexity theory Computer science Continuity (mathematics) covering numbers Dependence Empirical analysis Error analysis Gaussian complexities Gaussian processes generalization error bounds Learning Linear functions Multi-class classification Rademacher complexities Support vector machines Training Upper bound Upper bounds
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In this paper, we study data-dependent generalization error bounds that exhibit a mild dependency on the number of classes, making them suitable for multi-class learning with a large number of label classes. The bounds generally hold for empirical multi-class risk minimization algorithms using an arbitrary norm as the regularizer. Key to our analysis is new structural results for multi-class Gaussian complexities and empirical ℓ ∞ -norm covering numbers, which exploit the Lipschitz continuity of the loss function with respect to the ℓ 2 - and ℓ ∞ -norm, respectively. We establish data-dependent error bounds in terms of the complexities of a linear function class defined on a finite set induced by training examples, for which we show tight lower and upper bounds. We apply the results to several prominent multi-class learning machines and show a tighter dependency on the number of classes than the state of the art. For instance, for the multi-class support vector machine of Crammer and Singer (2002), we obtain a data-dependent bound with a logarithmic dependency, which is a significant improvement of the previous square-root dependency. The experimental results are reported to verify the effectiveness of our theoretical findings.
ISSN:	0018-9448 1557-9654
DOI:	10.1109/TIT.2019.2893916