The Projected Covariance Measure for assumption-lean variable significance testing
Testing the significance of a variable or group of variables $X$ for predicting a response $Y$, given additional covariates $Z$, is a ubiquitous task in statistics. A simple but common approach is to specify a linear model, and then test whether the regression coefficient for $X$ is non-zero. Howeve...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Testing the significance of a variable or group of variables $X$ for
predicting a response $Y$, given additional covariates $Z$, is a ubiquitous
task in statistics. A simple but common approach is to specify a linear model,
and then test whether the regression coefficient for $X$ is non-zero. However,
when the model is misspecified, the test may have poor power, for example when
$X$ is involved in complex interactions, or lead to many false rejections. In
this work we study the problem of testing the model-free null of conditional
mean independence, i.e. that the conditional mean of $Y$ given $X$ and $Z$ does
not depend on $X$. We propose a simple and general framework that can leverage
flexible nonparametric or machine learning methods, such as additive models or
random forests, to yield both robust error control and high power. The
procedure involves using these methods to perform regressions, first to
estimate a form of projection of $Y$ on $X$ and $Z$ using one half of the data,
and then to estimate the expected conditional covariance between this
projection and $Y$ on the remaining half of the data. While the approach is
general, we show that a version of our procedure using spline regression
achieves what we show is the minimax optimal rate in this nonparametric testing
problem. Numerical experiments demonstrate the effectiveness of our approach
both in terms of maintaining Type I error control, and power, compared to
several existing approaches. |
---|---|
DOI: | 10.48550/arxiv.2211.02039 |