Optimal cross-validation in density estimation with the L^sup 2^-loss

We analyze the performance of cross-validation (CV) in the density estimation framework with two purposes: (i) risk estimation and (ii) model selection. The main focus is given to the so-called leave-p-out CV procedure (Lpo), where p denotes the cardinality of the test set. Closed-form expressions a...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:The Annals of statistics 2014-10, Vol.42 (5), p.1879
1. Verfasser: Celisse, Alain
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:We analyze the performance of cross-validation (CV) in the density estimation framework with two purposes: (i) risk estimation and (ii) model selection. The main focus is given to the so-called leave-p-out CV procedure (Lpo), where p denotes the cardinality of the test set. Closed-form expressions are settled for the Lpo estimator of the risk of projection estimators. These expressions provide a great improvement upon V-fold cross-validation in terms of variability and computational complexity. From a theoretical point of view, closed-form expressions also enable to study the Lpo performance in terms of risk estimation. The optimality of leave-one-out (Loo), that is Lpo with p=1, is proved among CV procedures used for risk estimation. Two model selection frameworks are also considered: estimation, as opposed to identification. For estimation with finite sample size nn, optimality is achieved for pp large enough [with p/n=o(1)] to balance the overfitting resulting from the structure of the model collection. For identification, model selection consistency is settled for Lpo as long as p/n is conveniently related to the rate of convergence of the best estimator in the collection: (i) ... as ... with a parametric rate, and (ii) p/n=o(1) with some nonparametric estimators. These theoretical results are validated by simulation experiments. (ProQuest: ... denotes formulae/symbols omitted.)
ISSN:0090-5364
2168-8966