Breakdown Point of Model Selection When the Number of Variables Exceeds the Number of Observations

The classical multivariate linear regression problem assumes p variables X 1 , X 2 ,... ,X p and a response vector y, each with n observations, and a linear relationship between the two: y = Xbeta + z, where z ~ N(0, sigma 2 ). We point out that when p > n, there is a breakdown point for standard...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Donoho, D., Stodden, V.
Format:	Tagungsbericht
Sprache:	eng
Schlagworte:	Electric breakdown Equations Linear regression Noise level Noise reduction Predictive models Signal processing Signal processing algorithms Statistics Vectors
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The classical multivariate linear regression problem assumes p variables X 1 , X 2 ,... ,X p and a response vector y, each with n observations, and a linear relationship between the two: y = Xbeta + z, where z ~ N(0, sigma 2 ). We point out that when p > n, there is a breakdown point for standard model selection schemes, such that model selection only works well below a certain critical complexity level depending on n/p. We apply this notion to some standard model selection algorithms (Forward Stepwise, LASSO, LARS) in the case where pGtn. We find that 1) the breakdown point is well-de ned for random X-models and low noise, 2) increasing noise shifts the breakdown point to lower levels of sparsity, and reduces the model recovery ability of the algorithm in a systematic way, and 3) below breakdown, the size of coef cient errors follows the theoretical error distribution for the classical linear model.
ISSN:	2161-4393 2161-4407
DOI:	10.1109/IJCNN.2006.246934