A bounded influence regression estimator based on the statistics of the hat matrix
Many geophysical regression problems require the analysis of large (more than 104values) data sets, and, because the data may represent mixtures of concurrent natural processes with widely varying statistical properties, contamination of both response and predictor variables is common. Existing boun...
Gespeichert in:
Veröffentlicht in: | Applied statistics 2003-01, Vol.52 (3), p.307-322 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Many geophysical regression problems require the analysis of large (more than 104values) data sets, and, because the data may represent mixtures of concurrent natural processes with widely varying statistical properties, contamination of both response and predictor variables is common. Existing bounded influence or high breakdown point estimators frequently lack the ability to eliminate extremely influential data and/or the computational efficiency to handle large data sets. A new bounded influence estimator is proposed that combines high asymptotic efficiency for normal data, high breakdown point behaviour with contaminated data and computational simplicity for large data sets. The algorithm combines a standard M-estimator to downweight data corresponding to extreme regression residuals and removal of overly influential predictor values (leverage points) on the basis of the statistics of the hat matrix diagonal elements. For this, the exact distribution of the hat matrix diagonal elements$p_{ii}$for complex multivariate Gaussian predictor data is shown to be$\beta(p_{ii}, m, N - m)$, where N is the number of data and m is the number of parameters. Real geophysical data from an auroral zone magnetotelluric study which exhibit severe outlier and leverage point contamination are used to illustrate the estimator's performance. The examples also demonstrate the utility of looking at both the residual and the hat matrix distributions through quantile-quantile plots to diagnose robust regression problems. |
---|---|
ISSN: | 0035-9254 1467-9876 |
DOI: | 10.1111/1467-9876.00406 |