Sorting variables by using informative vectors as a strategy for feature selection in multivariate regression

A new procedure with high ability to enhance prediction of multivariate calibration models with a small number of interpretable variables is presented. The core of this methodology is to sort the variables from an informative vector, followed by a systematic investigation of PLS regression models wi...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of chemometrics 2009-01, Vol.23 (1), p.32-48
Hauptverfasser: Teófilo, Reinaldo F., Martins, João Paulo A., Ferreira, Márcia M. C.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 48
container_issue 1
container_start_page 32
container_title Journal of chemometrics
container_volume 23
creator Teófilo, Reinaldo F.
Martins, João Paulo A.
Ferreira, Márcia M. C.
description A new procedure with high ability to enhance prediction of multivariate calibration models with a small number of interpretable variables is presented. The core of this methodology is to sort the variables from an informative vector, followed by a systematic investigation of PLS regression models with the aim of finding the most relevant set of variables by comparing the cross‐validation parameters of the models obtained. In this work, seven main informative vectors i.e. regression vector, correlation vector, residual vector, variable influence on projection (VIP), net analyte signal (NAS), covariance procedures vector (CovProc), signal‐to‐noise ratios vector (StN) and their combinations were automated and tested with the main purpose of feature selection. Six data sets from different sources were employed to validate this methodology. They originated from: near‐Infrared (NIR) spectroscopy, Raman spectroscopy, gas chromatography (GC), fluorescence spectroscopy, quantitative structure‐activity relationships (QSAR) and computer simulation. The results indicate that all vectors and their combinations were able to enhance prediction capability with respect to the full data sets. However, regression and NAS informative vectors from partial least squares (PLS) regression, both built using more latent variables than when building the model presented in most of tested data sets, were the best informative vectors for variable selection. In all the applications, the selected variables were quite effective and useful for interpretation. Copyright © 2008 John Wiley & Sons, Ltd. A new variable selection procedure to enhance prediction of multivariate calibration models is presented. The methodology sorts the variables from an informative vector, investigates systematically PLS regression models and finds the most relevant set of variables by comparing the cross‐validation parameters of the models obtained. Seven informative vectors and their combinations were successfully tested for data sets from different applications.
doi_str_mv 10.1002/cem.1192
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_903646470</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>33910950</sourcerecordid><originalsourceid>FETCH-LOGICAL-c5202-d6e0dfb9c7de3cdec73bee3f865a92cfeeec6ccf59f08a4f95e0d5e3632554b33</originalsourceid><addsrcrecordid>eNqF0VtrFDEUB_AgFlyr4EcIgpeXqblMZiaPstRW6LaIiuJLyGRPltS5tDmZtfvtm2GXPghVCARyfucPJ4eQV5ydcMbEBwf9CedaPCELzrQuuGh-PiUL1jRVoWUjn5HniNeM5ZosF6T_OsYUhg3d2hhs2wHSdkcnnJ_C4MfY2xS2QLfg0hiR2nwopmgTbHY016kHm6YIFKHLJoxD7qP91OW2OTIBjbCJgJhLL8iRtx3Cy8N9TL5_Ov22PC8urs4-Lz9eFE4JJop1BWztW-3qNUi3BlfLFkD6plJWC-cBwFXOeaU9a2zptcpegaykUKpspTwm7_a5N3G8nQCT6QM66Do7wDih0UxWZVXWLMu3_5SyLHNqU_8fSp3_W82Jr_-C1-MUhzyuEYILUWeT0fs9cnFEjODNTQy9jTvDmZn3aPIezbzHTN8c8iw62_loBxfwwQvOK8HL2RV79yd0sHs0zyxPV4fcgw-Y4O7B2_jbVLWslflxeWbUqvl1ufoiDZf3Bg281w</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>221227503</pqid></control><display><type>article</type><title>Sorting variables by using informative vectors as a strategy for feature selection in multivariate regression</title><source>Wiley Online Library Journals Frontfile Complete</source><creator>Teófilo, Reinaldo F. ; Martins, João Paulo A. ; Ferreira, Márcia M. C.</creator><creatorcontrib>Teófilo, Reinaldo F. ; Martins, João Paulo A. ; Ferreira, Márcia M. C.</creatorcontrib><description>A new procedure with high ability to enhance prediction of multivariate calibration models with a small number of interpretable variables is presented. The core of this methodology is to sort the variables from an informative vector, followed by a systematic investigation of PLS regression models with the aim of finding the most relevant set of variables by comparing the cross‐validation parameters of the models obtained. In this work, seven main informative vectors i.e. regression vector, correlation vector, residual vector, variable influence on projection (VIP), net analyte signal (NAS), covariance procedures vector (CovProc), signal‐to‐noise ratios vector (StN) and their combinations were automated and tested with the main purpose of feature selection. Six data sets from different sources were employed to validate this methodology. They originated from: near‐Infrared (NIR) spectroscopy, Raman spectroscopy, gas chromatography (GC), fluorescence spectroscopy, quantitative structure‐activity relationships (QSAR) and computer simulation. The results indicate that all vectors and their combinations were able to enhance prediction capability with respect to the full data sets. However, regression and NAS informative vectors from partial least squares (PLS) regression, both built using more latent variables than when building the model presented in most of tested data sets, were the best informative vectors for variable selection. In all the applications, the selected variables were quite effective and useful for interpretation. Copyright © 2008 John Wiley &amp; Sons, Ltd. A new variable selection procedure to enhance prediction of multivariate calibration models is presented. The methodology sorts the variables from an informative vector, investigates systematically PLS regression models and finds the most relevant set of variables by comparing the cross‐validation parameters of the models obtained. Seven informative vectors and their combinations were successfully tested for data sets from different applications.</description><identifier>ISSN: 0886-9383</identifier><identifier>EISSN: 1099-128X</identifier><identifier>DOI: 10.1002/cem.1192</identifier><identifier>CODEN: JOCHEU</identifier><language>eng</language><publisher>Chichester, UK: John Wiley &amp; Sons, Ltd</publisher><subject>Calibration ; Chemistry ; chemometrics ; Comparative analysis ; Exact sciences and technology ; General and physical chemistry ; General. Nomenclature, chemical documentation, computer chemistry ; informative vectors ; Multivariate analysis ; OPS ; partial least squares ; Regression analysis ; Theory of reactions, general kinetics. Catalysis. Nomenclature, chemical documentation, computer chemistry ; variable selection</subject><ispartof>Journal of chemometrics, 2009-01, Vol.23 (1), p.32-48</ispartof><rights>Copyright © 2008 John Wiley &amp; Sons, Ltd.</rights><rights>2009 INIST-CNRS</rights><rights>Copyright John Wiley and Sons, Limited Jan 2009</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c5202-d6e0dfb9c7de3cdec73bee3f865a92cfeeec6ccf59f08a4f95e0d5e3632554b33</citedby><cites>FETCH-LOGICAL-c5202-d6e0dfb9c7de3cdec73bee3f865a92cfeeec6ccf59f08a4f95e0d5e3632554b33</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://onlinelibrary.wiley.com/doi/pdf/10.1002%2Fcem.1192$$EPDF$$P50$$Gwiley$$H</linktopdf><linktohtml>$$Uhttps://onlinelibrary.wiley.com/doi/full/10.1002%2Fcem.1192$$EHTML$$P50$$Gwiley$$H</linktohtml><link.rule.ids>314,777,781,1412,27905,27906,45555,45556</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&amp;idt=21162142$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><creatorcontrib>Teófilo, Reinaldo F.</creatorcontrib><creatorcontrib>Martins, João Paulo A.</creatorcontrib><creatorcontrib>Ferreira, Márcia M. C.</creatorcontrib><title>Sorting variables by using informative vectors as a strategy for feature selection in multivariate regression</title><title>Journal of chemometrics</title><addtitle>J. Chemometrics</addtitle><description>A new procedure with high ability to enhance prediction of multivariate calibration models with a small number of interpretable variables is presented. The core of this methodology is to sort the variables from an informative vector, followed by a systematic investigation of PLS regression models with the aim of finding the most relevant set of variables by comparing the cross‐validation parameters of the models obtained. In this work, seven main informative vectors i.e. regression vector, correlation vector, residual vector, variable influence on projection (VIP), net analyte signal (NAS), covariance procedures vector (CovProc), signal‐to‐noise ratios vector (StN) and their combinations were automated and tested with the main purpose of feature selection. Six data sets from different sources were employed to validate this methodology. They originated from: near‐Infrared (NIR) spectroscopy, Raman spectroscopy, gas chromatography (GC), fluorescence spectroscopy, quantitative structure‐activity relationships (QSAR) and computer simulation. The results indicate that all vectors and their combinations were able to enhance prediction capability with respect to the full data sets. However, regression and NAS informative vectors from partial least squares (PLS) regression, both built using more latent variables than when building the model presented in most of tested data sets, were the best informative vectors for variable selection. In all the applications, the selected variables were quite effective and useful for interpretation. Copyright © 2008 John Wiley &amp; Sons, Ltd. A new variable selection procedure to enhance prediction of multivariate calibration models is presented. The methodology sorts the variables from an informative vector, investigates systematically PLS regression models and finds the most relevant set of variables by comparing the cross‐validation parameters of the models obtained. Seven informative vectors and their combinations were successfully tested for data sets from different applications.</description><subject>Calibration</subject><subject>Chemistry</subject><subject>chemometrics</subject><subject>Comparative analysis</subject><subject>Exact sciences and technology</subject><subject>General and physical chemistry</subject><subject>General. Nomenclature, chemical documentation, computer chemistry</subject><subject>informative vectors</subject><subject>Multivariate analysis</subject><subject>OPS</subject><subject>partial least squares</subject><subject>Regression analysis</subject><subject>Theory of reactions, general kinetics. Catalysis. Nomenclature, chemical documentation, computer chemistry</subject><subject>variable selection</subject><issn>0886-9383</issn><issn>1099-128X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2009</creationdate><recordtype>article</recordtype><recordid>eNqF0VtrFDEUB_AgFlyr4EcIgpeXqblMZiaPstRW6LaIiuJLyGRPltS5tDmZtfvtm2GXPghVCARyfucPJ4eQV5ydcMbEBwf9CedaPCELzrQuuGh-PiUL1jRVoWUjn5HniNeM5ZosF6T_OsYUhg3d2hhs2wHSdkcnnJ_C4MfY2xS2QLfg0hiR2nwopmgTbHY016kHm6YIFKHLJoxD7qP91OW2OTIBjbCJgJhLL8iRtx3Cy8N9TL5_Ov22PC8urs4-Lz9eFE4JJop1BWztW-3qNUi3BlfLFkD6plJWC-cBwFXOeaU9a2zptcpegaykUKpspTwm7_a5N3G8nQCT6QM66Do7wDih0UxWZVXWLMu3_5SyLHNqU_8fSp3_W82Jr_-C1-MUhzyuEYILUWeT0fs9cnFEjODNTQy9jTvDmZn3aPIezbzHTN8c8iw62_loBxfwwQvOK8HL2RV79yd0sHs0zyxPV4fcgw-Y4O7B2_jbVLWslflxeWbUqvl1ufoiDZf3Bg281w</recordid><startdate>200901</startdate><enddate>200901</enddate><creator>Teófilo, Reinaldo F.</creator><creator>Martins, João Paulo A.</creator><creator>Ferreira, Márcia M. C.</creator><general>John Wiley &amp; Sons, Ltd</general><general>Wiley</general><general>Wiley Subscription Services, Inc</general><scope>BSCLL</scope><scope>IQODW</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7U5</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>200901</creationdate><title>Sorting variables by using informative vectors as a strategy for feature selection in multivariate regression</title><author>Teófilo, Reinaldo F. ; Martins, João Paulo A. ; Ferreira, Márcia M. C.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c5202-d6e0dfb9c7de3cdec73bee3f865a92cfeeec6ccf59f08a4f95e0d5e3632554b33</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2009</creationdate><topic>Calibration</topic><topic>Chemistry</topic><topic>chemometrics</topic><topic>Comparative analysis</topic><topic>Exact sciences and technology</topic><topic>General and physical chemistry</topic><topic>General. Nomenclature, chemical documentation, computer chemistry</topic><topic>informative vectors</topic><topic>Multivariate analysis</topic><topic>OPS</topic><topic>partial least squares</topic><topic>Regression analysis</topic><topic>Theory of reactions, general kinetics. Catalysis. Nomenclature, chemical documentation, computer chemistry</topic><topic>variable selection</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Teófilo, Reinaldo F.</creatorcontrib><creatorcontrib>Martins, João Paulo A.</creatorcontrib><creatorcontrib>Ferreira, Márcia M. C.</creatorcontrib><collection>Istex</collection><collection>Pascal-Francis</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Journal of chemometrics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Teófilo, Reinaldo F.</au><au>Martins, João Paulo A.</au><au>Ferreira, Márcia M. C.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Sorting variables by using informative vectors as a strategy for feature selection in multivariate regression</atitle><jtitle>Journal of chemometrics</jtitle><addtitle>J. Chemometrics</addtitle><date>2009-01</date><risdate>2009</risdate><volume>23</volume><issue>1</issue><spage>32</spage><epage>48</epage><pages>32-48</pages><issn>0886-9383</issn><eissn>1099-128X</eissn><coden>JOCHEU</coden><abstract>A new procedure with high ability to enhance prediction of multivariate calibration models with a small number of interpretable variables is presented. The core of this methodology is to sort the variables from an informative vector, followed by a systematic investigation of PLS regression models with the aim of finding the most relevant set of variables by comparing the cross‐validation parameters of the models obtained. In this work, seven main informative vectors i.e. regression vector, correlation vector, residual vector, variable influence on projection (VIP), net analyte signal (NAS), covariance procedures vector (CovProc), signal‐to‐noise ratios vector (StN) and their combinations were automated and tested with the main purpose of feature selection. Six data sets from different sources were employed to validate this methodology. They originated from: near‐Infrared (NIR) spectroscopy, Raman spectroscopy, gas chromatography (GC), fluorescence spectroscopy, quantitative structure‐activity relationships (QSAR) and computer simulation. The results indicate that all vectors and their combinations were able to enhance prediction capability with respect to the full data sets. However, regression and NAS informative vectors from partial least squares (PLS) regression, both built using more latent variables than when building the model presented in most of tested data sets, were the best informative vectors for variable selection. In all the applications, the selected variables were quite effective and useful for interpretation. Copyright © 2008 John Wiley &amp; Sons, Ltd. A new variable selection procedure to enhance prediction of multivariate calibration models is presented. The methodology sorts the variables from an informative vector, investigates systematically PLS regression models and finds the most relevant set of variables by comparing the cross‐validation parameters of the models obtained. Seven informative vectors and their combinations were successfully tested for data sets from different applications.</abstract><cop>Chichester, UK</cop><pub>John Wiley &amp; Sons, Ltd</pub><doi>10.1002/cem.1192</doi><tpages>17</tpages></addata></record>
fulltext fulltext
identifier ISSN: 0886-9383
ispartof Journal of chemometrics, 2009-01, Vol.23 (1), p.32-48
issn 0886-9383
1099-128X
language eng
recordid cdi_proquest_miscellaneous_903646470
source Wiley Online Library Journals Frontfile Complete
subjects Calibration
Chemistry
chemometrics
Comparative analysis
Exact sciences and technology
General and physical chemistry
General. Nomenclature, chemical documentation, computer chemistry
informative vectors
Multivariate analysis
OPS
partial least squares
Regression analysis
Theory of reactions, general kinetics. Catalysis. Nomenclature, chemical documentation, computer chemistry
variable selection
title Sorting variables by using informative vectors as a strategy for feature selection in multivariate regression
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-18T12%3A37%3A14IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Sorting%20variables%20by%20using%20informative%20vectors%20as%20a%20strategy%20for%20feature%20selection%20in%20multivariate%20regression&rft.jtitle=Journal%20of%20chemometrics&rft.au=Te%C3%B3filo,%20Reinaldo%20F.&rft.date=2009-01&rft.volume=23&rft.issue=1&rft.spage=32&rft.epage=48&rft.pages=32-48&rft.issn=0886-9383&rft.eissn=1099-128X&rft.coden=JOCHEU&rft_id=info:doi/10.1002/cem.1192&rft_dat=%3Cproquest_cross%3E33910950%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=221227503&rft_id=info:pmid/&rfr_iscdi=true