Estimation of predictive performance for test data in applicability domains using y‐randomization

A new measure of predictive performance for objective variables in regression analysis is proposed, enabling the y‐errors of new samples or test samples to be estimated in the applicability domains (ADs) of regression models. The proposed measure, based on y‐randomization, considers chance correlati...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of chemometrics 2019-09, Vol.33 (9), p.n/a
1. Verfasser: Kaneko, Hiromasa
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page n/a
container_issue 9
container_start_page
container_title Journal of chemometrics
container_volume 33
creator Kaneko, Hiromasa
description A new measure of predictive performance for objective variables in regression analysis is proposed, enabling the y‐errors of new samples or test samples to be estimated in the applicability domains (ADs) of regression models. The proposed measure, based on y‐randomization, considers chance correlations and is calculated using only training data. This chance correlation‐excluded mean absolute error (MAECCE) can estimate the y‐errors of new samples considering the influence of chance correlations in the given dataset on the regression models. Experiments using numerical simulation, quantitative structure‐activity relationship, and quantitative structure‐property relationship datasets confirm that MAECCE can estimate the distribution of y‐errors of new samples in ADs for various training datasets, descriptor sets, and regression analysis methods, enabling chance correlations to be eliminated from data analysis. Python and MATLAB codes for the proposed algorithm are available at https://github.com/hkaneko1985/maecce. The proposed measure, based on y‐randomization, considers chance correlations and is calculated using only training data. This chance correlation‐excluded mean absolute error (MAECCE) can estimate the y‐errors of new samples or test samples in the applicability domains (ADs) of regression models, considering the influence of chance correlations in the given dataset on the regression models.
doi_str_mv 10.1002/cem.3171
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2288558727</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2288558727</sourcerecordid><originalsourceid>FETCH-LOGICAL-c3591-1512c77b7afb08b277fca0dc3c93307818e8f608fa412f9e407bb5ec7334a93d3</originalsourceid><addsrcrecordid>eNp1kM1KAzEQx4MoWKvgIwS8eNmaj90mOUqpH1DxouAtZLOJpOxm1yRV1pOP4DP6JKatV08zzPz4z_AD4ByjGUaIXGnTzShm-ABMMBKiwIS_HIIJ4nxeCMrpMTiJcY1Q3tFyAvQyJtep5HoPewuHYBqnk3s3cDDB9qFTXhuYG5hMTLBRSUHnoRqG1mlVu9alETZ9p5yPcBOdf4Xjz9d3UD4P3ecu-BQcWdVGc_ZXp-D5Zvm0uCtWj7f3i-tVoWklcIErTDRjNVO2RrwmjFmtUKOpFpQixjE33M4Rt6rExApTIlbXldGM0lIJ2tApuNjnDqF_2-R35brfBJ9PSkI4ryrOCMvU5Z7SoY8xGCuHkA2EUWIktwplVii3CjNa7NEP15rxX04ulg87_hemF3SR</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2288558727</pqid></control><display><type>article</type><title>Estimation of predictive performance for test data in applicability domains using y‐randomization</title><source>Wiley Online Library Journals Frontfile Complete</source><creator>Kaneko, Hiromasa</creator><creatorcontrib>Kaneko, Hiromasa</creatorcontrib><description>A new measure of predictive performance for objective variables in regression analysis is proposed, enabling the y‐errors of new samples or test samples to be estimated in the applicability domains (ADs) of regression models. The proposed measure, based on y‐randomization, considers chance correlations and is calculated using only training data. This chance correlation‐excluded mean absolute error (MAECCE) can estimate the y‐errors of new samples considering the influence of chance correlations in the given dataset on the regression models. Experiments using numerical simulation, quantitative structure‐activity relationship, and quantitative structure‐property relationship datasets confirm that MAECCE can estimate the distribution of y‐errors of new samples in ADs for various training datasets, descriptor sets, and regression analysis methods, enabling chance correlations to be eliminated from data analysis. Python and MATLAB codes for the proposed algorithm are available at https://github.com/hkaneko1985/maecce. The proposed measure, based on y‐randomization, considers chance correlations and is calculated using only training data. This chance correlation‐excluded mean absolute error (MAECCE) can estimate the y‐errors of new samples or test samples in the applicability domains (ADs) of regression models, considering the influence of chance correlations in the given dataset on the regression models.</description><identifier>ISSN: 0886-9383</identifier><identifier>EISSN: 1099-128X</identifier><identifier>DOI: 10.1002/cem.3171</identifier><language>eng</language><publisher>Chichester: Wiley Subscription Services, Inc</publisher><subject>Algorithms ; applicability domains ; Computer simulation ; Correlation analysis ; Data analysis ; Datasets ; Domains ; Performance prediction ; predictive performance ; QSAR ; QSPR ; Randomization ; regression ; Regression analysis ; Regression models ; Training ; y‐randomization</subject><ispartof>Journal of chemometrics, 2019-09, Vol.33 (9), p.n/a</ispartof><rights>2019 John Wiley &amp; Sons, Ltd.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c3591-1512c77b7afb08b277fca0dc3c93307818e8f608fa412f9e407bb5ec7334a93d3</citedby><cites>FETCH-LOGICAL-c3591-1512c77b7afb08b277fca0dc3c93307818e8f608fa412f9e407bb5ec7334a93d3</cites><orcidid>0000-0001-8367-6476</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://onlinelibrary.wiley.com/doi/pdf/10.1002%2Fcem.3171$$EPDF$$P50$$Gwiley$$H</linktopdf><linktohtml>$$Uhttps://onlinelibrary.wiley.com/doi/full/10.1002%2Fcem.3171$$EHTML$$P50$$Gwiley$$H</linktohtml><link.rule.ids>314,776,780,1411,27901,27902,45550,45551</link.rule.ids></links><search><creatorcontrib>Kaneko, Hiromasa</creatorcontrib><title>Estimation of predictive performance for test data in applicability domains using y‐randomization</title><title>Journal of chemometrics</title><description>A new measure of predictive performance for objective variables in regression analysis is proposed, enabling the y‐errors of new samples or test samples to be estimated in the applicability domains (ADs) of regression models. The proposed measure, based on y‐randomization, considers chance correlations and is calculated using only training data. This chance correlation‐excluded mean absolute error (MAECCE) can estimate the y‐errors of new samples considering the influence of chance correlations in the given dataset on the regression models. Experiments using numerical simulation, quantitative structure‐activity relationship, and quantitative structure‐property relationship datasets confirm that MAECCE can estimate the distribution of y‐errors of new samples in ADs for various training datasets, descriptor sets, and regression analysis methods, enabling chance correlations to be eliminated from data analysis. Python and MATLAB codes for the proposed algorithm are available at https://github.com/hkaneko1985/maecce. The proposed measure, based on y‐randomization, considers chance correlations and is calculated using only training data. This chance correlation‐excluded mean absolute error (MAECCE) can estimate the y‐errors of new samples or test samples in the applicability domains (ADs) of regression models, considering the influence of chance correlations in the given dataset on the regression models.</description><subject>Algorithms</subject><subject>applicability domains</subject><subject>Computer simulation</subject><subject>Correlation analysis</subject><subject>Data analysis</subject><subject>Datasets</subject><subject>Domains</subject><subject>Performance prediction</subject><subject>predictive performance</subject><subject>QSAR</subject><subject>QSPR</subject><subject>Randomization</subject><subject>regression</subject><subject>Regression analysis</subject><subject>Regression models</subject><subject>Training</subject><subject>y‐randomization</subject><issn>0886-9383</issn><issn>1099-128X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><recordid>eNp1kM1KAzEQx4MoWKvgIwS8eNmaj90mOUqpH1DxouAtZLOJpOxm1yRV1pOP4DP6JKatV08zzPz4z_AD4ByjGUaIXGnTzShm-ABMMBKiwIS_HIIJ4nxeCMrpMTiJcY1Q3tFyAvQyJtep5HoPewuHYBqnk3s3cDDB9qFTXhuYG5hMTLBRSUHnoRqG1mlVu9alETZ9p5yPcBOdf4Xjz9d3UD4P3ecu-BQcWdVGc_ZXp-D5Zvm0uCtWj7f3i-tVoWklcIErTDRjNVO2RrwmjFmtUKOpFpQixjE33M4Rt6rExApTIlbXldGM0lIJ2tApuNjnDqF_2-R35brfBJ9PSkI4ryrOCMvU5Z7SoY8xGCuHkA2EUWIktwplVii3CjNa7NEP15rxX04ulg87_hemF3SR</recordid><startdate>201909</startdate><enddate>201909</enddate><creator>Kaneko, Hiromasa</creator><general>Wiley Subscription Services, Inc</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7U5</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0001-8367-6476</orcidid></search><sort><creationdate>201909</creationdate><title>Estimation of predictive performance for test data in applicability domains using y‐randomization</title><author>Kaneko, Hiromasa</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c3591-1512c77b7afb08b277fca0dc3c93307818e8f608fa412f9e407bb5ec7334a93d3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Algorithms</topic><topic>applicability domains</topic><topic>Computer simulation</topic><topic>Correlation analysis</topic><topic>Data analysis</topic><topic>Datasets</topic><topic>Domains</topic><topic>Performance prediction</topic><topic>predictive performance</topic><topic>QSAR</topic><topic>QSPR</topic><topic>Randomization</topic><topic>regression</topic><topic>Regression analysis</topic><topic>Regression models</topic><topic>Training</topic><topic>y‐randomization</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Kaneko, Hiromasa</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Journal of chemometrics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Kaneko, Hiromasa</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Estimation of predictive performance for test data in applicability domains using y‐randomization</atitle><jtitle>Journal of chemometrics</jtitle><date>2019-09</date><risdate>2019</risdate><volume>33</volume><issue>9</issue><epage>n/a</epage><issn>0886-9383</issn><eissn>1099-128X</eissn><abstract>A new measure of predictive performance for objective variables in regression analysis is proposed, enabling the y‐errors of new samples or test samples to be estimated in the applicability domains (ADs) of regression models. The proposed measure, based on y‐randomization, considers chance correlations and is calculated using only training data. This chance correlation‐excluded mean absolute error (MAECCE) can estimate the y‐errors of new samples considering the influence of chance correlations in the given dataset on the regression models. Experiments using numerical simulation, quantitative structure‐activity relationship, and quantitative structure‐property relationship datasets confirm that MAECCE can estimate the distribution of y‐errors of new samples in ADs for various training datasets, descriptor sets, and regression analysis methods, enabling chance correlations to be eliminated from data analysis. Python and MATLAB codes for the proposed algorithm are available at https://github.com/hkaneko1985/maecce. The proposed measure, based on y‐randomization, considers chance correlations and is calculated using only training data. This chance correlation‐excluded mean absolute error (MAECCE) can estimate the y‐errors of new samples or test samples in the applicability domains (ADs) of regression models, considering the influence of chance correlations in the given dataset on the regression models.</abstract><cop>Chichester</cop><pub>Wiley Subscription Services, Inc</pub><doi>10.1002/cem.3171</doi><tpages>12</tpages><orcidid>https://orcid.org/0000-0001-8367-6476</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 0886-9383
ispartof Journal of chemometrics, 2019-09, Vol.33 (9), p.n/a
issn 0886-9383
1099-128X
language eng
recordid cdi_proquest_journals_2288558727
source Wiley Online Library Journals Frontfile Complete
subjects Algorithms
applicability domains
Computer simulation
Correlation analysis
Data analysis
Datasets
Domains
Performance prediction
predictive performance
QSAR
QSPR
Randomization
regression
Regression analysis
Regression models
Training
y‐randomization
title Estimation of predictive performance for test data in applicability domains using y‐randomization
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-29T19%3A59%3A42IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Estimation%20of%20predictive%20performance%20for%20test%20data%20in%20applicability%20domains%20using%20y%E2%80%90randomization&rft.jtitle=Journal%20of%20chemometrics&rft.au=Kaneko,%20Hiromasa&rft.date=2019-09&rft.volume=33&rft.issue=9&rft.epage=n/a&rft.issn=0886-9383&rft.eissn=1099-128X&rft_id=info:doi/10.1002/cem.3171&rft_dat=%3Cproquest_cross%3E2288558727%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2288558727&rft_id=info:pmid/&rfr_iscdi=true