Improving Sparsity and Scalability in Regularized Nonconvex Truncated-Loss Learning Problems

The truncated regular L 1 -loss support vector machine can eliminate the excessive number of support vectors (SVs); thus, it has significant advantages in robustness and scalability. However, in this paper, we discover that the associated state-of-the-art solvers, such as difference convex algorithm...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transaction on neural networks and learning systems 2018-07, Vol.29 (7), p.2782-2793
Hauptverfasser: Tao, Qing, Wu, Gaowei, Chu, Dejun
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The truncated regular L 1 -loss support vector machine can eliminate the excessive number of support vectors (SVs); thus, it has significant advantages in robustness and scalability. However, in this paper, we discover that the associated state-of-the-art solvers, such as difference convex algorithm and concave-convex procedure, not only have limited sparsity promoting property for general truncated losses especially the L 2 -loss but also have poor scalability for large-scale problems. To circumvent these drawbacks, we present a general multistage scheme with explicit interpretation regarding SVs as well as outliers. In particular, we solve the general nonconvex truncated loss minimization through a sequence of associated convex subproblems, in which the outliers are removed in advance. The proposed algorithm can be regarded as a structural optimization attempt carefully considering sparsity imposed by the nonconvex truncated losses. We show that this general multistage algorithm offers sufficient sparsity especially for the truncated L 2 -loss. To further improve the scalability, we propose a linear multistep algorithm by employing a single iteration of coordinate descent to monotonically decrease the objective function at each stage and a kernel algorithm by using the Karush-Kuhn-Tucker conditions to cheaply find most part of the outliers for the next stage. Comparison experiments demonstrate that our methods have superiority in sparsity as well as efficiency in scalability.
ISSN:2162-237X
2162-2388
DOI:10.1109/TNNLS.2017.2705429