Optimal cross-validation in density estimation with the L^sup 2^-loss

We analyze the performance of cross-validation (CV) in the density estimation framework with two purposes: (i) risk estimation and (ii) model selection. The main focus is given to the so-called leave-p-out CV procedure (Lpo), where p denotes the cardinality of the test set. Closed-form expressions a...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	The Annals of statistics 2014-10, Vol.42 (5), p.1879
1. Verfasser:	Celisse, Alain
Format:	Artikel
Sprache:	eng
Schlagworte:	Estimating techniques Mathematical models Risk assessment Sample size Simulation Studies
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue	5
container_start_page	1879
container_title	The Annals of statistics
container_volume	42
creator	Celisse, Alain
description	We analyze the performance of cross-validation (CV) in the density estimation framework with two purposes: (i) risk estimation and (ii) model selection. The main focus is given to the so-called leave-p-out CV procedure (Lpo), where p denotes the cardinality of the test set. Closed-form expressions are settled for the Lpo estimator of the risk of projection estimators. These expressions provide a great improvement upon V-fold cross-validation in terms of variability and computational complexity. From a theoretical point of view, closed-form expressions also enable to study the Lpo performance in terms of risk estimation. The optimality of leave-one-out (Loo), that is Lpo with p=1, is proved among CV procedures used for risk estimation. Two model selection frameworks are also considered: estimation, as opposed to identification. For estimation with finite sample size nn, optimality is achieved for pp large enough [with p/n=o(1)] to balance the overfitting resulting from the structure of the model collection. For identification, model selection consistency is settled for Lpo as long as p/n is conveniently related to the rate of convergence of the best estimator in the collection: (i) ... as ... with a parametric rate, and (ii) p/n=o(1) with some nonparametric estimators. These theoretical results are validated by simulation experiments. (ProQuest: ... denotes formulae/symbols omitted.)
format	Article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_1787036342</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>4046310261</sourcerecordid><originalsourceid>FETCH-proquest_journals_17870363423</originalsourceid><addsrcrecordid>eNqNi10LgjAYRkcUZB__4YWuB9Pp1Oswugi66VoZunCyNvOdRf8-i35AVw-c55wZCaJQZDTLhZiTgLGc0YSLeElWiB1jLMljHpDi3Ht9kwbqwSHShzS6kV47C9pCoyxq_wKFH-dLn9q34FsFpxLHHqKSmqnbkMVVGlTb367J7lBc9kfaD-4-TnnVuXGw01WFaZYyLngc8f-sN51rPFM</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1787036342</pqid></control><display><type>article</type><title>Optimal cross-validation in density estimation with the L^sup 2^-loss</title><source>Jstor Complete Legacy</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><source>Project Euclid Complete</source><source>JSTOR Mathematics & Statistics</source><creator>Celisse, Alain</creator><creatorcontrib>Celisse, Alain</creatorcontrib><description>We analyze the performance of cross-validation (CV) in the density estimation framework with two purposes: (i) risk estimation and (ii) model selection. The main focus is given to the so-called leave-p-out CV procedure (Lpo), where p denotes the cardinality of the test set. Closed-form expressions are settled for the Lpo estimator of the risk of projection estimators. These expressions provide a great improvement upon V-fold cross-validation in terms of variability and computational complexity. From a theoretical point of view, closed-form expressions also enable to study the Lpo performance in terms of risk estimation. The optimality of leave-one-out (Loo), that is Lpo with p=1, is proved among CV procedures used for risk estimation. Two model selection frameworks are also considered: estimation, as opposed to identification. For estimation with finite sample size nn, optimality is achieved for pp large enough [with p/n=o(1)] to balance the overfitting resulting from the structure of the model collection. For identification, model selection consistency is settled for Lpo as long as p/n is conveniently related to the rate of convergence of the best estimator in the collection: (i) ... as ... with a parametric rate, and (ii) p/n=o(1) with some nonparametric estimators. These theoretical results are validated by simulation experiments. (ProQuest: ... denotes formulae/symbols omitted.)</description><identifier>ISSN: 0090-5364</identifier><identifier>EISSN: 2168-8966</identifier><language>eng</language><publisher>Hayward: Institute of Mathematical Statistics</publisher><subject>Estimating techniques ; Mathematical models ; Risk assessment ; Sample size ; Simulation ; Studies</subject><ispartof>The Annals of statistics, 2014-10, Vol.42 (5), p.1879</ispartof><rights>Copyright Institute of Mathematical Statistics Oct 2014</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780</link.rule.ids></links><search><creatorcontrib>Celisse, Alain</creatorcontrib><title>Optimal cross-validation in density estimation with the L^sup 2^-loss</title><title>The Annals of statistics</title><description>We analyze the performance of cross-validation (CV) in the density estimation framework with two purposes: (i) risk estimation and (ii) model selection. The main focus is given to the so-called leave-p-out CV procedure (Lpo), where p denotes the cardinality of the test set. Closed-form expressions are settled for the Lpo estimator of the risk of projection estimators. These expressions provide a great improvement upon V-fold cross-validation in terms of variability and computational complexity. From a theoretical point of view, closed-form expressions also enable to study the Lpo performance in terms of risk estimation. The optimality of leave-one-out (Loo), that is Lpo with p=1, is proved among CV procedures used for risk estimation. Two model selection frameworks are also considered: estimation, as opposed to identification. For estimation with finite sample size nn, optimality is achieved for pp large enough [with p/n=o(1)] to balance the overfitting resulting from the structure of the model collection. For identification, model selection consistency is settled for Lpo as long as p/n is conveniently related to the rate of convergence of the best estimator in the collection: (i) ... as ... with a parametric rate, and (ii) p/n=o(1) with some nonparametric estimators. These theoretical results are validated by simulation experiments. (ProQuest: ... denotes formulae/symbols omitted.)</description><subject>Estimating techniques</subject><subject>Mathematical models</subject><subject>Risk assessment</subject><subject>Sample size</subject><subject>Simulation</subject><subject>Studies</subject><issn>0090-5364</issn><issn>2168-8966</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2014</creationdate><recordtype>article</recordtype><recordid>eNqNi10LgjAYRkcUZB__4YWuB9Pp1Oswugi66VoZunCyNvOdRf8-i35AVw-c55wZCaJQZDTLhZiTgLGc0YSLeElWiB1jLMljHpDi3Ht9kwbqwSHShzS6kV47C9pCoyxq_wKFH-dLn9q34FsFpxLHHqKSmqnbkMVVGlTb367J7lBc9kfaD-4-TnnVuXGw01WFaZYyLngc8f-sN51rPFM</recordid><startdate>20141001</startdate><enddate>20141001</enddate><creator>Celisse, Alain</creator><general>Institute of Mathematical Statistics</general><scope>JQ2</scope></search><sort><creationdate>20141001</creationdate><title>Optimal cross-validation in density estimation with the L^sup 2^-loss</title><author>Celisse, Alain</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_17870363423</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2014</creationdate><topic>Estimating techniques</topic><topic>Mathematical models</topic><topic>Risk assessment</topic><topic>Sample size</topic><topic>Simulation</topic><topic>Studies</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Celisse, Alain</creatorcontrib><collection>ProQuest Computer Science Collection</collection><jtitle>The Annals of statistics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Celisse, Alain</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Optimal cross-validation in density estimation with the L^sup 2^-loss</atitle><jtitle>The Annals of statistics</jtitle><date>2014-10-01</date><risdate>2014</risdate><volume>42</volume><issue>5</issue><spage>1879</spage><pages>1879-</pages><issn>0090-5364</issn><eissn>2168-8966</eissn><abstract>We analyze the performance of cross-validation (CV) in the density estimation framework with two purposes: (i) risk estimation and (ii) model selection. The main focus is given to the so-called leave-p-out CV procedure (Lpo), where p denotes the cardinality of the test set. Closed-form expressions are settled for the Lpo estimator of the risk of projection estimators. These expressions provide a great improvement upon V-fold cross-validation in terms of variability and computational complexity. From a theoretical point of view, closed-form expressions also enable to study the Lpo performance in terms of risk estimation. The optimality of leave-one-out (Loo), that is Lpo with p=1, is proved among CV procedures used for risk estimation. Two model selection frameworks are also considered: estimation, as opposed to identification. For estimation with finite sample size nn, optimality is achieved for pp large enough [with p/n=o(1)] to balance the overfitting resulting from the structure of the model collection. For identification, model selection consistency is settled for Lpo as long as p/n is conveniently related to the rate of convergence of the best estimator in the collection: (i) ... as ... with a parametric rate, and (ii) p/n=o(1) with some nonparametric estimators. These theoretical results are validated by simulation experiments. (ProQuest: ... denotes formulae/symbols omitted.)</abstract><cop>Hayward</cop><pub>Institute of Mathematical Statistics</pub></addata></record>
fulltext	fulltext
identifier	ISSN: 0090-5364
ispartof	The Annals of statistics, 2014-10, Vol.42 (5), p.1879
issn	0090-5364 2168-8966
language	eng
recordid	cdi_proquest_journals_1787036342
source	Jstor Complete Legacy; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals; Project Euclid Complete; JSTOR Mathematics & Statistics
subjects	Estimating techniques Mathematical models Risk assessment Sample size Simulation Studies
title	Optimal cross-validation in density estimation with the L^sup 2^-loss
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-05T07%3A02%3A41IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Optimal%20cross-validation%20in%20density%20estimation%20with%20the%20L%5Esup%202%5E-loss&rft.jtitle=The%20Annals%20of%20statistics&rft.au=Celisse,%20Alain&rft.date=2014-10-01&rft.volume=42&rft.issue=5&rft.spage=1879&rft.pages=1879-&rft.issn=0090-5364&rft.eissn=2168-8966&rft_id=info:doi/&rft_dat=%3Cproquest%3E4046310261%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1787036342&rft_id=info:pmid/&rfr_iscdi=true