Detecting cellwise outliers in multivariate and high-dimensional data

Standard statistical techniques such as least squares regression are very accurate if the underlying distributional assumptions are satisfied, such as Gaussianity. The assumption of Gaussian errors precludes outliers, which are observations that deviate from the fit suggested by the majority of the...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
1. Verfasser: Van den Bossche, W
Format: Dissertation
Sprache:eng
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Standard statistical techniques such as least squares regression are very accurate if the underlying distributional assumptions are satisfied, such as Gaussianity. The assumption of Gaussian errors precludes outliers, which are observations that deviate from the fit suggested by the majority of the data. But real data often do contain outliers, which destroy the least squares fit. Nowadays data are often high-dimensional and in that case the outliers are even harder to detect. Because of thisrobust estimators have been developed, which are less sensitive to outliers. As a side effect, the outliers can be detected by their residuals from the robust fit. Unfortunately, many robust methods currently require substantial computation time, so it is necessary to develop fasteralgorithms for them. There has been some progress in the construction of fast algorithms for robust linear regression and for the robust estimation of multivariate scatter matrices, but there is much room for improvement. This doctoral project aims to develop efficient algorithms for robust regression through the origin, for scatter matrices and principal components with given center, for sparse estimation and variable selection in high dimension, for robust low-rank approximation of multivariate data (a kind of singular value decomposition), and for robust estimation for data containing cellwise outliers.