Sorting variables by using informative vectors as a strategy for feature selection in multivariate regression

A new procedure with high ability to enhance prediction of multivariate calibration models with a small number of interpretable variables is presented. The core of this methodology is to sort the variables from an informative vector, followed by a systematic investigation of PLS regression models wi...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Journal of chemometrics 2009-01, Vol.23 (1), p.32-48
Hauptverfasser:	Teófilo, Reinaldo F., Martins, João Paulo A., Ferreira, Márcia M. C.
Format:	Artikel
Sprache:	eng
Schlagworte:	Calibration Chemistry chemometrics Comparative analysis Exact sciences and technology General and physical chemistry General. Nomenclature, chemical documentation, computer chemistry informative vectors Multivariate analysis OPS partial least squares Regression analysis Theory of reactions, general kinetics. Catalysis. Nomenclature, chemical documentation, computer chemistry variable selection
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	48
container_issue	1
container_start_page	32
container_title	Journal of chemometrics
container_volume	23
creator	Teófilo, Reinaldo F. Martins, João Paulo A. Ferreira, Márcia M. C.
description	A new procedure with high ability to enhance prediction of multivariate calibration models with a small number of interpretable variables is presented. The core of this methodology is to sort the variables from an informative vector, followed by a systematic investigation of PLS regression models with the aim of finding the most relevant set of variables by comparing the cross‐validation parameters of the models obtained. In this work, seven main informative vectors i.e. regression vector, correlation vector, residual vector, variable influence on projection (VIP), net analyte signal (NAS), covariance procedures vector (CovProc), signal‐to‐noise ratios vector (StN) and their combinations were automated and tested with the main purpose of feature selection. Six data sets from different sources were employed to validate this methodology. They originated from: near‐Infrared (NIR) spectroscopy, Raman spectroscopy, gas chromatography (GC), fluorescence spectroscopy, quantitative structure‐activity relationships (QSAR) and computer simulation. The results indicate that all vectors and their combinations were able to enhance prediction capability with respect to the full data sets. However, regression and NAS informative vectors from partial least squares (PLS) regression, both built using more latent variables than when building the model presented in most of tested data sets, were the best informative vectors for variable selection. In all the applications, the selected variables were quite effective and useful for interpretation. Copyright © 2008 John Wiley & Sons, Ltd. A new variable selection procedure to enhance prediction of multivariate calibration models is presented. The methodology sorts the variables from an informative vector, investigates systematically PLS regression models and finds the most relevant set of variables by comparing the cross‐validation parameters of the models obtained. Seven informative vectors and their combinations were successfully tested for data sets from different applications.
doi_str_mv	10.1002/cem.1192
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_903646470</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>33910950</sourcerecordid><originalsourceid>FETCH-LOGICAL-c5202-d6e0dfb9c7de3cdec73bee3f865a92cfeeec6ccf59f08a4f95e0d5e3632554b33</originalsourceid><addsrcrecordid>eNqF0VtrFDEUB_AgFlyr4EcIgpeXqblMZiaPstRW6LaIiuJLyGRPltS5tDmZtfvtm2GXPghVCARyfucPJ4eQV5ydcMbEBwf9CedaPCELzrQuuGh-PiUL1jRVoWUjn5HniNeM5ZosF6T_OsYUhg3d2hhs2wHSdkcnnJ_C4MfY2xS2QLfg0hiR2nwopmgTbHY016kHm6YIFKHLJoxD7qP91OW2OTIBjbCJgJhLL8iRtx3Cy8N9TL5_Ov22PC8urs4-Lz9eFE4JJop1BWztW-3qNUi3BlfLFkD6plJWC-cBwFXOeaU9a2zptcpegaykUKpspTwm7_a5N3G8nQCT6QM66Do7wDih0UxWZVXWLMu3_5SyLHNqU_8fSp3_W82Jr_-C1-MUhzyuEYILUWeT0fs9cnFEjODNTQy9jTvDmZn3aPIezbzHTN8c8iw62_loBxfwwQvOK8HL2RV79yd0sHs0zyxPV4fcgw-Y4O7B2_jbVLWslflxeWbUqvl1ufoiDZf3Bg281w</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>221227503</pqid></control><display><type>article</type><title>Sorting variables by using informative vectors as a strategy for feature selection in multivariate regression</title><source>Wiley Online Library Journals Frontfile Complete</source><creator>Teófilo, Reinaldo F. ; Martins, João Paulo A. ; Ferreira, Márcia M. C.</creator><creatorcontrib>Teófilo, Reinaldo F. ; Martins, João Paulo A. ; Ferreira, Márcia M. C.</creatorcontrib><description>A new procedure with high ability to enhance prediction of multivariate calibration models with a small number of interpretable variables is presented. The core of this methodology is to sort the variables from an informative vector, followed by a systematic investigation of PLS regression models with the aim of finding the most relevant set of variables by comparing the cross‐validation parameters of the models obtained. In this work, seven main informative vectors i.e. regression vector, correlation vector, residual vector, variable influence on projection (VIP), net analyte signal (NAS), covariance procedures vector (CovProc), signal‐to‐noise ratios vector (StN) and their combinations were automated and tested with the main purpose of feature selection. Six data sets from different sources were employed to validate this methodology. They originated from: near‐Infrared (NIR) spectroscopy, Raman spectroscopy, gas chromatography (GC), fluorescence spectroscopy, quantitative structure‐activity relationships (QSAR) and computer simulation. The results indicate that all vectors and their combinations were able to enhance prediction capability with respect to the full data sets. However, regression and NAS informative vectors from partial least squares (PLS) regression, both built using more latent variables than when building the model presented in most of tested data sets, were the best informative vectors for variable selection. In all the applications, the selected variables were quite effective and useful for interpretation. Copyright © 2008 John Wiley & Sons, Ltd. A new variable selection procedure to enhance prediction of multivariate calibration models is presented. The methodology sorts the variables from an informative vector, investigates systematically PLS regression models and finds the most relevant set of variables by comparing the cross‐validation parameters of the models obtained. Seven informative vectors and their combinations were successfully tested for data sets from different applications.</description><identifier>ISSN: 0886-9383</identifier><identifier>EISSN: 1099-128X</identifier><identifier>DOI: 10.1002/cem.1192</identifier><identifier>CODEN: JOCHEU</identifier><language>eng</language><publisher>Chichester, UK: John Wiley & Sons, Ltd</publisher><subject>Calibration ; Chemistry ; chemometrics ; Comparative analysis ; Exact sciences and technology ; General and physical chemistry ; General. Nomenclature, chemical documentation, computer chemistry ; informative vectors ; Multivariate analysis ; OPS ; partial least squares ; Regression analysis ; Theory of reactions, general kinetics. Catalysis. Nomenclature, chemical documentation, computer chemistry ; variable selection</subject><ispartof>Journal of chemometrics, 2009-01, Vol.23 (1), p.32-48</ispartof><rights>Copyright © 2008 John Wiley & Sons, Ltd.</rights><rights>2009 INIST-CNRS</rights><rights>Copyright John Wiley and Sons, Limited Jan 2009</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c5202-d6e0dfb9c7de3cdec73bee3f865a92cfeeec6ccf59f08a4f95e0d5e3632554b33</citedby><cites>FETCH-LOGICAL-c5202-d6e0dfb9c7de3cdec73bee3f865a92cfeeec6ccf59f08a4f95e0d5e3632554b33</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://onlinelibrary.wiley.com/doi/pdf/10.1002%2Fcem.1192$$EPDF$$P50$$Gwiley$$H</linktopdf><linktohtml>$$Uhttps://onlinelibrary.wiley.com/doi/full/10.1002%2Fcem.1192$$EHTML$$P50$$Gwiley$$H</linktohtml><link.rule.ids>314,777,781,1412,27905,27906,45555,45556</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=21162142$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><creatorcontrib>Teófilo, Reinaldo F.</creatorcontrib><creatorcontrib>Martins, João Paulo A.</creatorcontrib><creatorcontrib>Ferreira, Márcia M. C.</creatorcontrib><title>Sorting variables by using informative vectors as a strategy for feature selection in multivariate regression</title><title>Journal of chemometrics</title><addtitle>J. Chemometrics</addtitle><description>A new procedure with high ability to enhance prediction of multivariate calibration models with a small number of interpretable variables is presented. The core of this methodology is to sort the variables from an informative vector, followed by a systematic investigation of PLS regression models with the aim of finding the most relevant set of variables by comparing the cross‐validation parameters of the models obtained. In this work, seven main informative vectors i.e. regression vector, correlation vector, residual vector, variable influence on projection (VIP), net analyte signal (NAS), covariance procedures vector (CovProc), signal‐to‐noise ratios vector (StN) and their combinations were automated and tested with the main purpose of feature selection. Six data sets from different sources were employed to validate this methodology. They originated from: near‐Infrared (NIR) spectroscopy, Raman spectroscopy, gas chromatography (GC), fluorescence spectroscopy, quantitative structure‐activity relationships (QSAR) and computer simulation. The results indicate that all vectors and their combinations were able to enhance prediction capability with respect to the full data sets. However, regression and NAS informative vectors from partial least squares (PLS) regression, both built using more latent variables than when building the model presented in most of tested data sets, were the best informative vectors for variable selection. In all the applications, the selected variables were quite effective and useful for interpretation. Copyright © 2008 John Wiley & Sons, Ltd. A new variable selection procedure to enhance prediction of multivariate calibration models is presented. The methodology sorts the variables from an informative vector, investigates systematically PLS regression models and finds the most relevant set of variables by comparing the cross‐validation parameters of the models obtained. Seven informative vectors and their combinations were successfully tested for data sets from different applications.</description><subject>Calibration</subject><subject>Chemistry</subject><subject>chemometrics</subject><subject>Comparative analysis</subject><subject>Exact sciences and technology</subject><subject>General and physical chemistry</subject><subject>General. Nomenclature, chemical documentation, computer chemistry</subject><subject>informative vectors</subject><subject>Multivariate analysis</subject><subject>OPS</subject><subject>partial least squares</subject><subject>Regression analysis</subject><subject>Theory of reactions, general kinetics. Catalysis. Nomenclature, chemical documentation, computer chemistry</subject><subject>variable selection</subject><issn>0886-9383</issn><issn>1099-128X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2009</creationdate><recordtype>article</recordtype><recordid>eNqF0VtrFDEUB_AgFlyr4EcIgpeXqblMZiaPstRW6LaIiuJLyGRPltS5tDmZtfvtm2GXPghVCARyfucPJ4eQV5ydcMbEBwf9CedaPCELzrQuuGh-PiUL1jRVoWUjn5HniNeM5ZosF6T_OsYUhg3d2hhs2wHSdkcnnJ_C4MfY2xS2QLfg0hiR2nwopmgTbHY016kHm6YIFKHLJoxD7qP91OW2OTIBjbCJgJhLL8iRtx3Cy8N9TL5_Ov22PC8urs4-Lz9eFE4JJop1BWztW-3qNUi3BlfLFkD6plJWC-cBwFXOeaU9a2zptcpegaykUKpspTwm7_a5N3G8nQCT6QM66Do7wDih0UxWZVXWLMu3_5SyLHNqU_8fSp3_W82Jr_-C1-MUhzyuEYILUWeT0fs9cnFEjODNTQy9jTvDmZn3aPIezbzHTN8c8iw62_loBxfwwQvOK8HL2RV79yd0sHs0zyxPV4fcgw-Y4O7B2_jbVLWslflxeWbUqvl1ufoiDZf3Bg281w</recordid><startdate>200901</startdate><enddate>200901</enddate><creator>Teófilo, Reinaldo F.</creator><creator>Martins, João Paulo A.</creator><creator>Ferreira, Márcia M. C.</creator><general>John Wiley & Sons, Ltd</general><general>Wiley</general><general>Wiley Subscription Services, Inc</general><scope>BSCLL</scope><scope>IQODW</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7U5</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>200901</creationdate><title>Sorting variables by using informative vectors as a strategy for feature selection in multivariate regression</title><author>Teófilo, Reinaldo F. ; Martins, João Paulo A. ; Ferreira, Márcia M. C.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c5202-d6e0dfb9c7de3cdec73bee3f865a92cfeeec6ccf59f08a4f95e0d5e3632554b33</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2009</creationdate><topic>Calibration</topic><topic>Chemistry</topic><topic>chemometrics</topic><topic>Comparative analysis</topic><topic>Exact sciences and technology</topic><topic>General and physical chemistry</topic><topic>General. Nomenclature, chemical documentation, computer chemistry</topic><topic>informative vectors</topic><topic>Multivariate analysis</topic><topic>OPS</topic><topic>partial least squares</topic><topic>Regression analysis</topic><topic>Theory of reactions, general kinetics. Catalysis. Nomenclature, chemical documentation, computer chemistry</topic><topic>variable selection</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Teófilo, Reinaldo F.</creatorcontrib><creatorcontrib>Martins, João Paulo A.</creatorcontrib><creatorcontrib>Ferreira, Márcia M. C.</creatorcontrib><collection>Istex</collection><collection>Pascal-Francis</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Journal of chemometrics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Teófilo, Reinaldo F.</au><au>Martins, João Paulo A.</au><au>Ferreira, Márcia M. C.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Sorting variables by using informative vectors as a strategy for feature selection in multivariate regression</atitle><jtitle>Journal of chemometrics</jtitle><addtitle>J. Chemometrics</addtitle><date>2009-01</date><risdate>2009</risdate><volume>23</volume><issue>1</issue><spage>32</spage><epage>48</epage><pages>32-48</pages><issn>0886-9383</issn><eissn>1099-128X</eissn><coden>JOCHEU</coden><abstract>A new procedure with high ability to enhance prediction of multivariate calibration models with a small number of interpretable variables is presented. The core of this methodology is to sort the variables from an informative vector, followed by a systematic investigation of PLS regression models with the aim of finding the most relevant set of variables by comparing the cross‐validation parameters of the models obtained. In this work, seven main informative vectors i.e. regression vector, correlation vector, residual vector, variable influence on projection (VIP), net analyte signal (NAS), covariance procedures vector (CovProc), signal‐to‐noise ratios vector (StN) and their combinations were automated and tested with the main purpose of feature selection. Six data sets from different sources were employed to validate this methodology. They originated from: near‐Infrared (NIR) spectroscopy, Raman spectroscopy, gas chromatography (GC), fluorescence spectroscopy, quantitative structure‐activity relationships (QSAR) and computer simulation. The results indicate that all vectors and their combinations were able to enhance prediction capability with respect to the full data sets. However, regression and NAS informative vectors from partial least squares (PLS) regression, both built using more latent variables than when building the model presented in most of tested data sets, were the best informative vectors for variable selection. In all the applications, the selected variables were quite effective and useful for interpretation. Copyright © 2008 John Wiley & Sons, Ltd. A new variable selection procedure to enhance prediction of multivariate calibration models is presented. The methodology sorts the variables from an informative vector, investigates systematically PLS regression models and finds the most relevant set of variables by comparing the cross‐validation parameters of the models obtained. Seven informative vectors and their combinations were successfully tested for data sets from different applications.</abstract><cop>Chichester, UK</cop><pub>John Wiley & Sons, Ltd</pub><doi>10.1002/cem.1192</doi><tpages>17</tpages></addata></record>
fulltext	fulltext
identifier	ISSN: 0886-9383
ispartof	Journal of chemometrics, 2009-01, Vol.23 (1), p.32-48
issn	0886-9383 1099-128X
language	eng
recordid	cdi_proquest_miscellaneous_903646470
source	Wiley Online Library Journals Frontfile Complete
subjects	Calibration Chemistry chemometrics Comparative analysis Exact sciences and technology General and physical chemistry General. Nomenclature, chemical documentation, computer chemistry informative vectors Multivariate analysis OPS partial least squares Regression analysis Theory of reactions, general kinetics. Catalysis. Nomenclature, chemical documentation, computer chemistry variable selection
title	Sorting variables by using informative vectors as a strategy for feature selection in multivariate regression
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-18T12%3A37%3A14IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Sorting%20variables%20by%20using%20informative%20vectors%20as%20a%20strategy%20for%20feature%20selection%20in%20multivariate%20regression&rft.jtitle=Journal%20of%20chemometrics&rft.au=Te%C3%B3filo,%20Reinaldo%20F.&rft.date=2009-01&rft.volume=23&rft.issue=1&rft.spage=32&rft.epage=48&rft.pages=32-48&rft.issn=0886-9383&rft.eissn=1099-128X&rft.coden=JOCHEU&rft_id=info:doi/10.1002/cem.1192&rft_dat=%3Cproquest_cross%3E33910950%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=221227503&rft_id=info:pmid/&rfr_iscdi=true