Testing conditional mean through regression model sequence using Yanai’s generalized coefficient of determination

In high-dimensional data analysis such as in genomics, repeated univariate regression for each variable is utilized to screen useful variables. However, signals jointly detectable with other variables may be overlooked. While the saturated model using all variables may not work in high-dimensional d...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Computational statistics & data analysis 2021-06, Vol.158, p.107168, Article 107168
1. Verfasser: Ueki, Masao
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In high-dimensional data analysis such as in genomics, repeated univariate regression for each variable is utilized to screen useful variables. However, signals jointly detectable with other variables may be overlooked. While the saturated model using all variables may not work in high-dimensional data, based on prior knowledge, group-wise analysis for a pre-defined group is often developed, but the power will be limited if the knowledge is insufficient. A flexible test procedure is thus proposed for conditional mean applicable to a variety of model sequences that bridge between low and high complexity models as in penalized regression. The test is based on the model that maximizes a generalization of the Yanai’s generalized coefficient of determination by exploiting the tendency for the dimensionality to be large under the null hypothesis. The test does not require complicated null distribution computation, thereby enabling large-scale testing application. Numerical studies demonstrated that the proposed test applied to the lasso and elastic net had a high power regardless of the simulation scenarios. Applied to a group-wise analysis in real genome-wide association study data from Alzheimer’s Disease Neuroimaging Initiative, the proposal gave a higher association signal than the existing methods.
ISSN:0167-9473
1872-7352
DOI:10.1016/j.csda.2021.107168