Accuracy of Pseudo-Inverse Covariance Learning-A Random Matrix Theory Analysis

For many learning problems, estimates of the inverse population covariance are required and often obtained by inverting the sample covariance matrix. Increasingly for modern scientific data sets, the number of sample points is less than the number of features and so the sample covariance is not inve...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on pattern analysis and machine intelligence 2011-07, Vol.33 (7), p.1470-1481
1. Verfasser: Hoyle, D C
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:For many learning problems, estimates of the inverse population covariance are required and often obtained by inverting the sample covariance matrix. Increasingly for modern scientific data sets, the number of sample points is less than the number of features and so the sample covariance is not invertible. In such circumstances, the Moore-Penrose pseudo-inverse sample covariance matrix, constructed from the eigenvectors corresponding to nonzero sample covariance eigenvalues, is often used as an approximation to the inverse population covariance matrix. The reconstruction error of the pseudo-inverse sample covariance matrix in estimating the true inverse covariance can be quantified via the Frobenius norm of the difference between the two. The reconstruction error is dominated by the smallest nonzero sample covariance eigenvalues and diverges as the sample size becomes comparable to the number of features. For high-dimensional data, we use random matrix theory techniques and results to study the reconstruction error for a wide class of population covariance matrices. We also show how bagging and random subspace methods can result in a reduction in the reconstruction error and can be combined to improve the accuracy of classifiers that utilize the pseudo-inverse sample covariance matrix. We test our analysis on both simulated and benchmark data sets.
ISSN:0162-8828
1939-3539
DOI:10.1109/TPAMI.2010.186