Model-free feature screening via distance correlation for ultrahigh dimensional survival data

With the explosion of ultrahigh dimensional data in various fields, many sure independent screening methods have been proposed to reduce the dimensionality of data from a large scale to a relatively moderate scale. For censored survival data, the existing screening methods mainly adopt the Kaplan–Me...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Statistical papers (Berlin, Germany) Germany), 2021-12, Vol.62 (6), p.2711-2738
Hauptverfasser: Zhang, Jing, Liu, Yanyan, Cui, Hengjian
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:With the explosion of ultrahigh dimensional data in various fields, many sure independent screening methods have been proposed to reduce the dimensionality of data from a large scale to a relatively moderate scale. For censored survival data, the existing screening methods mainly adopt the Kaplan–Meier estimator to handle censoring, which may not perform well for heavy censoring cases. In this article, we propose a novel sure independent screening procedure based on distance correlation after standardizing marginal variables for ultrahigh dimensional survival data. It is a model-free approach and does not involve the Kaplan–Meier estimator, thus its performance is much more robust than the existing methods. Furthermore, our proposed method enjoys other advantages: it avoids the complication to specify an actual model from large number of covariates; it enjoys the sure screening property and the ranking consistency under some mild regularity conditions; it does not require any complicated numerical optimization, so the corresponding calculation is very simple and fast. Extensive numerical studies demonstrate that the proposed method has favorable exhibition over the existing methods. As an illustration, we apply the proposed method to a gene expression data set.
ISSN:0932-5026
1613-9798
DOI:10.1007/s00362-020-01210-3