Minimization and estimation of the variance of prediction errors for cross-validation designs

We consider the mean prediction error of a classification or regression procedure as well as its cross-validation estimates, and investigate the variance of this estimate as a function of an arbitrary cross-validation design. We decompose this variance into a scalar product of coefficients and certa...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Fuchs, Mathias, Krautenbacher, Norbert
Format:	Dataset
Sprache:	eng
Schlagworte:	Biological Sciences not elsewhere classified Environmental Sciences not elsewhere classified FOS: Biological sciences FOS: Earth and related environmental sciences FOS: Mathematics Genetics Mathematical Sciences not elsewhere classified Molecular Biology Plant Biology Science Policy Space Science
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Fuchs, Mathias Krautenbacher, Norbert
description	We consider the mean prediction error of a classification or regression procedure as well as its cross-validation estimates, and investigate the variance of this estimate as a function of an arbitrary cross-validation design. We decompose this variance into a scalar product of coefficients and certain covariance expressions, such that the coefficients depend solely on the resampling design, and the covariances depend solely on the data’s probability distribution. We rewrite this scalar product in such a form that the initially large number of summands can gradually be decreased down to three under the validity of a quadratic approximation to the core covariances. We show an analytical example in which this quadratic approximation holds true exactly. Moreover, in this example, we show that the leave-p-out estimator of the error depends on p only by means of a constant and can, therefore, be written in a much simpler form. Furthermore, there is an unbiased estimator of the variance of K-fold cross-validation, in contrast to a claim in the literature. As a consequence, we can show that balanced incomplete block designs have smaller variance than K-fold cross-validation. In a real data example from the UCI machine learning repository, this property can be confirmed. We finally show how to find balanced incomplete block designs in practice.
doi_str_mv	10.6084/m9.figshare.3124324
format	Dataset
fullrecord	<record><control><sourceid>datacite_PQ8</sourceid><recordid>TN_cdi_datacite_primary_10_6084_m9_figshare_3124324</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_6084_m9_figshare_3124324</sourcerecordid><originalsourceid>FETCH-LOGICAL-d894-5365b5615d7dc15851514c530f4ec71f918611331edad073e35ca42f2d215e6d3</originalsourceid><addsrcrecordid>eNo1j81qwzAQhHXpoaR9gl70Ana9Xsk_xxL6Bym95BqEql0lC7EdJBNon75Jk56GYZgZPqUeoCqbqjOPQ19G2eadT1wi1AZrc6s2HzLKID9-lmnUfiTNeZbhYqeo5x3ro0_ix8Bnf0hMEv5STmlKWccp6ZCmnIuj3wtdmsRZtmO-UzfR7zPfX3Wh1i_P6-Vbsfp8fV8-rQrqelNYbOyXbcBSSwFsZ8GCCRaraDi0EHvoGgBEYPJUtchogzd1rKkGyw3hQuFl9vTug8zsDunEkL4dVO7M7obe_bO7Kzv-AgBfVr4</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>dataset</recordtype></control><display><type>dataset</type><title>Minimization and estimation of the variance of prediction errors for cross-validation designs</title><source>DataCite</source><creator>Fuchs, Mathias ; Krautenbacher, Norbert</creator><creatorcontrib>Fuchs, Mathias ; Krautenbacher, Norbert</creatorcontrib><description>We consider the mean prediction error of a classification or regression procedure as well as its cross-validation estimates, and investigate the variance of this estimate as a function of an arbitrary cross-validation design. We decompose this variance into a scalar product of coefficients and certain covariance expressions, such that the coefficients depend solely on the resampling design, and the covariances depend solely on the data’s probability distribution. We rewrite this scalar product in such a form that the initially large number of summands can gradually be decreased down to three under the validity of a quadratic approximation to the core covariances. We show an analytical example in which this quadratic approximation holds true exactly. Moreover, in this example, we show that the leave-p-out estimator of the error depends on p only by means of a constant and can, therefore, be written in a much simpler form. Furthermore, there is an unbiased estimator of the variance of K-fold cross-validation, in contrast to a claim in the literature. As a consequence, we can show that balanced incomplete block designs have smaller variance than K-fold cross-validation. In a real data example from the UCI machine learning repository, this property can be confirmed. We finally show how to find balanced incomplete block designs in practice.</description><identifier>DOI: 10.6084/m9.figshare.3124324</identifier><language>eng</language><publisher>Taylor & Francis</publisher><subject>Biological Sciences not elsewhere classified ; Environmental Sciences not elsewhere classified ; FOS: Biological sciences ; FOS: Earth and related environmental sciences ; FOS: Mathematics ; Genetics ; Mathematical Sciences not elsewhere classified ; Molecular Biology ; Plant Biology ; Science Policy ; Space Science</subject><creationdate>2016</creationdate><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>780,1892</link.rule.ids><linktorsrc>$$Uhttps://commons.datacite.org/doi.org/10.6084/m9.figshare.3124324$$EView_record_in_DataCite.org$$FView_record_in_$$GDataCite.org$$Hfree_for_read</linktorsrc></links><search><creatorcontrib>Fuchs, Mathias</creatorcontrib><creatorcontrib>Krautenbacher, Norbert</creatorcontrib><title>Minimization and estimation of the variance of prediction errors for cross-validation designs</title><description>We consider the mean prediction error of a classification or regression procedure as well as its cross-validation estimates, and investigate the variance of this estimate as a function of an arbitrary cross-validation design. We decompose this variance into a scalar product of coefficients and certain covariance expressions, such that the coefficients depend solely on the resampling design, and the covariances depend solely on the data’s probability distribution. We rewrite this scalar product in such a form that the initially large number of summands can gradually be decreased down to three under the validity of a quadratic approximation to the core covariances. We show an analytical example in which this quadratic approximation holds true exactly. Moreover, in this example, we show that the leave-p-out estimator of the error depends on p only by means of a constant and can, therefore, be written in a much simpler form. Furthermore, there is an unbiased estimator of the variance of K-fold cross-validation, in contrast to a claim in the literature. As a consequence, we can show that balanced incomplete block designs have smaller variance than K-fold cross-validation. In a real data example from the UCI machine learning repository, this property can be confirmed. We finally show how to find balanced incomplete block designs in practice.</description><subject>Biological Sciences not elsewhere classified</subject><subject>Environmental Sciences not elsewhere classified</subject><subject>FOS: Biological sciences</subject><subject>FOS: Earth and related environmental sciences</subject><subject>FOS: Mathematics</subject><subject>Genetics</subject><subject>Mathematical Sciences not elsewhere classified</subject><subject>Molecular Biology</subject><subject>Plant Biology</subject><subject>Science Policy</subject><subject>Space Science</subject><fulltext>true</fulltext><rsrctype>dataset</rsrctype><creationdate>2016</creationdate><recordtype>dataset</recordtype><sourceid>PQ8</sourceid><recordid>eNo1j81qwzAQhHXpoaR9gl70Ana9Xsk_xxL6Bym95BqEql0lC7EdJBNon75Jk56GYZgZPqUeoCqbqjOPQ19G2eadT1wi1AZrc6s2HzLKID9-lmnUfiTNeZbhYqeo5x3ro0_ix8Bnf0hMEv5STmlKWccp6ZCmnIuj3wtdmsRZtmO-UzfR7zPfX3Wh1i_P6-Vbsfp8fV8-rQrqelNYbOyXbcBSSwFsZ8GCCRaraDi0EHvoGgBEYPJUtchogzd1rKkGyw3hQuFl9vTug8zsDunEkL4dVO7M7obe_bO7Kzv-AgBfVr4</recordid><startdate>20160325</startdate><enddate>20160325</enddate><creator>Fuchs, Mathias</creator><creator>Krautenbacher, Norbert</creator><general>Taylor & Francis</general><scope>DYCCY</scope><scope>PQ8</scope></search><sort><creationdate>20160325</creationdate><title>Minimization and estimation of the variance of prediction errors for cross-validation designs</title><author>Fuchs, Mathias ; Krautenbacher, Norbert</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-d894-5365b5615d7dc15851514c530f4ec71f918611331edad073e35ca42f2d215e6d3</frbrgroupid><rsrctype>datasets</rsrctype><prefilter>datasets</prefilter><language>eng</language><creationdate>2016</creationdate><topic>Biological Sciences not elsewhere classified</topic><topic>Environmental Sciences not elsewhere classified</topic><topic>FOS: Biological sciences</topic><topic>FOS: Earth and related environmental sciences</topic><topic>FOS: Mathematics</topic><topic>Genetics</topic><topic>Mathematical Sciences not elsewhere classified</topic><topic>Molecular Biology</topic><topic>Plant Biology</topic><topic>Science Policy</topic><topic>Space Science</topic><toplevel>online_resources</toplevel><creatorcontrib>Fuchs, Mathias</creatorcontrib><creatorcontrib>Krautenbacher, Norbert</creatorcontrib><collection>DataCite (Open Access)</collection><collection>DataCite</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Fuchs, Mathias</au><au>Krautenbacher, Norbert</au><format>book</format><genre>unknown</genre><ristype>DATA</ristype><title>Minimization and estimation of the variance of prediction errors for cross-validation designs</title><date>2016-03-25</date><risdate>2016</risdate><abstract>We consider the mean prediction error of a classification or regression procedure as well as its cross-validation estimates, and investigate the variance of this estimate as a function of an arbitrary cross-validation design. We decompose this variance into a scalar product of coefficients and certain covariance expressions, such that the coefficients depend solely on the resampling design, and the covariances depend solely on the data’s probability distribution. We rewrite this scalar product in such a form that the initially large number of summands can gradually be decreased down to three under the validity of a quadratic approximation to the core covariances. We show an analytical example in which this quadratic approximation holds true exactly. Moreover, in this example, we show that the leave-p-out estimator of the error depends on p only by means of a constant and can, therefore, be written in a much simpler form. Furthermore, there is an unbiased estimator of the variance of K-fold cross-validation, in contrast to a claim in the literature. As a consequence, we can show that balanced incomplete block designs have smaller variance than K-fold cross-validation. In a real data example from the UCI machine learning repository, this property can be confirmed. We finally show how to find balanced incomplete block designs in practice.</abstract><pub>Taylor & Francis</pub><doi>10.6084/m9.figshare.3124324</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.6084/m9.figshare.3124324
ispartof
issn
language	eng
recordid	cdi_datacite_primary_10_6084_m9_figshare_3124324
source	DataCite
subjects	Biological Sciences not elsewhere classified Environmental Sciences not elsewhere classified FOS: Biological sciences FOS: Earth and related environmental sciences FOS: Mathematics Genetics Mathematical Sciences not elsewhere classified Molecular Biology Plant Biology Science Policy Space Science
title	Minimization and estimation of the variance of prediction errors for cross-validation designs
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-09T22%3A23%3A58IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-datacite_PQ8&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=unknown&rft.au=Fuchs,%20Mathias&rft.date=2016-03-25&rft_id=info:doi/10.6084/m9.figshare.3124324&rft_dat=%3Cdatacite_PQ8%3E10_6084_m9_figshare_3124324%3C/datacite_PQ8%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true