Split Bregman method for large scale fused Lasso

Ordering of regression or classification coefficients occurs in many real-world applications. Fused Lasso exploits this ordering by explicitly regularizing the differences between neighboring coefficients through an ℓ 1 norm regularizer. However, due to nonseparability and nonsmoothness of the regul...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Computational statistics & data analysis 2011-04, Vol.55 (4), p.1552-1569
Hauptverfasser:	Ye, Gui-Bo, Xie, Xiaohui
Format:	Artikel
Sprache:	eng
Schlagworte:	[formula omitted]-norm Algorithms Arrays Bregman iteration Computation Exact sciences and technology Fused Lasso Fused Lasso Bregman iteration l1-norm Fused Lasso support vector classifier Fused Lasso support vector classifier General topics Mathematical analysis Mathematical models Mathematics Multivariate analysis Numerical analysis Numerical analysis. Scientific computation Numerical linear algebra Numerical methods in probability and statistics Order disorder Probability and statistics Samples Sciences and techniques of general use Solvers Statistical methods Statistics
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Ordering of regression or classification coefficients occurs in many real-world applications. Fused Lasso exploits this ordering by explicitly regularizing the differences between neighboring coefficients through an ℓ 1 norm regularizer. However, due to nonseparability and nonsmoothness of the regularization term, solving the fused Lasso problem is computationally demanding. Existing solvers can only deal with problems of small or medium size, or a special case of the fused Lasso problem in which the predictor matrix is the identity matrix. In this paper, we propose an iterative algorithm based on the split Bregman method to solve a class of large-scale fused Lasso problems, including a generalized fused Lasso and a fused Lasso support vector classifier. We derive our algorithm using an augmented Lagrangian method and prove its convergence properties. The performance of our method is tested on both artificial data and real-world applications including proteomic data from mass spectrometry and genomic data from array comparative genomic hybridization (array CGH). We demonstrate that our method is many times faster than the existing solvers, and show that it is especially efficient for large p , small n problems, where p is the number of variables and n is the number of samples.
ISSN:	0167-9473 1872-7352
DOI:	10.1016/j.csda.2010.10.021