Sorting variables by using informative vectors as a strategy for feature selection in multivariate regression
A new procedure with high ability to enhance prediction of multivariate calibration models with a small number of interpretable variables is presented. The core of this methodology is to sort the variables from an informative vector, followed by a systematic investigation of PLS regression models wi...
Gespeichert in:
Veröffentlicht in: | Journal of chemometrics 2009-01, Vol.23 (1), p.32-48 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 48 |
---|---|
container_issue | 1 |
container_start_page | 32 |
container_title | Journal of chemometrics |
container_volume | 23 |
creator | Teófilo, Reinaldo F. Martins, João Paulo A. Ferreira, Márcia M. C. |
description | A new procedure with high ability to enhance prediction of multivariate calibration models with a small number of interpretable variables is presented. The core of this methodology is to sort the variables from an informative vector, followed by a systematic investigation of PLS regression models with the aim of finding the most relevant set of variables by comparing the cross‐validation parameters of the models obtained. In this work, seven main informative vectors i.e. regression vector, correlation vector, residual vector, variable influence on projection (VIP), net analyte signal (NAS), covariance procedures vector (CovProc), signal‐to‐noise ratios vector (StN) and their combinations were automated and tested with the main purpose of feature selection. Six data sets from different sources were employed to validate this methodology. They originated from: near‐Infrared (NIR) spectroscopy, Raman spectroscopy, gas chromatography (GC), fluorescence spectroscopy, quantitative structure‐activity relationships (QSAR) and computer simulation. The results indicate that all vectors and their combinations were able to enhance prediction capability with respect to the full data sets. However, regression and NAS informative vectors from partial least squares (PLS) regression, both built using more latent variables than when building the model presented in most of tested data sets, were the best informative vectors for variable selection. In all the applications, the selected variables were quite effective and useful for interpretation. Copyright © 2008 John Wiley & Sons, Ltd.
A new variable selection procedure to enhance prediction of multivariate calibration models is presented. The methodology sorts the variables from an informative vector, investigates systematically PLS regression models and finds the most relevant set of variables by comparing the cross‐validation parameters of the models obtained. Seven informative vectors and their combinations were successfully tested for data sets from different applications. |
doi_str_mv | 10.1002/cem.1192 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_903646470</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>33910950</sourcerecordid><originalsourceid>FETCH-LOGICAL-c5202-d6e0dfb9c7de3cdec73bee3f865a92cfeeec6ccf59f08a4f95e0d5e3632554b33</originalsourceid><addsrcrecordid>eNqF0VtrFDEUB_AgFlyr4EcIgpeXqblMZiaPstRW6LaIiuJLyGRPltS5tDmZtfvtm2GXPghVCARyfucPJ4eQV5ydcMbEBwf9CedaPCELzrQuuGh-PiUL1jRVoWUjn5HniNeM5ZosF6T_OsYUhg3d2hhs2wHSdkcnnJ_C4MfY2xS2QLfg0hiR2nwopmgTbHY016kHm6YIFKHLJoxD7qP91OW2OTIBjbCJgJhLL8iRtx3Cy8N9TL5_Ov22PC8urs4-Lz9eFE4JJop1BWztW-3qNUi3BlfLFkD6plJWC-cBwFXOeaU9a2zptcpegaykUKpspTwm7_a5N3G8nQCT6QM66Do7wDih0UxWZVXWLMu3_5SyLHNqU_8fSp3_W82Jr_-C1-MUhzyuEYILUWeT0fs9cnFEjODNTQy9jTvDmZn3aPIezbzHTN8c8iw62_loBxfwwQvOK8HL2RV79yd0sHs0zyxPV4fcgw-Y4O7B2_jbVLWslflxeWbUqvl1ufoiDZf3Bg281w</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>221227503</pqid></control><display><type>article</type><title>Sorting variables by using informative vectors as a strategy for feature selection in multivariate regression</title><source>Wiley Online Library Journals Frontfile Complete</source><creator>Teófilo, Reinaldo F. ; Martins, João Paulo A. ; Ferreira, Márcia M. C.</creator><creatorcontrib>Teófilo, Reinaldo F. ; Martins, João Paulo A. ; Ferreira, Márcia M. C.</creatorcontrib><description>A new procedure with high ability to enhance prediction of multivariate calibration models with a small number of interpretable variables is presented. The core of this methodology is to sort the variables from an informative vector, followed by a systematic investigation of PLS regression models with the aim of finding the most relevant set of variables by comparing the cross‐validation parameters of the models obtained. In this work, seven main informative vectors i.e. regression vector, correlation vector, residual vector, variable influence on projection (VIP), net analyte signal (NAS), covariance procedures vector (CovProc), signal‐to‐noise ratios vector (StN) and their combinations were automated and tested with the main purpose of feature selection. Six data sets from different sources were employed to validate this methodology. They originated from: near‐Infrared (NIR) spectroscopy, Raman spectroscopy, gas chromatography (GC), fluorescence spectroscopy, quantitative structure‐activity relationships (QSAR) and computer simulation. The results indicate that all vectors and their combinations were able to enhance prediction capability with respect to the full data sets. However, regression and NAS informative vectors from partial least squares (PLS) regression, both built using more latent variables than when building the model presented in most of tested data sets, were the best informative vectors for variable selection. In all the applications, the selected variables were quite effective and useful for interpretation. Copyright © 2008 John Wiley & Sons, Ltd.
A new variable selection procedure to enhance prediction of multivariate calibration models is presented. The methodology sorts the variables from an informative vector, investigates systematically PLS regression models and finds the most relevant set of variables by comparing the cross‐validation parameters of the models obtained. Seven informative vectors and their combinations were successfully tested for data sets from different applications.</description><identifier>ISSN: 0886-9383</identifier><identifier>EISSN: 1099-128X</identifier><identifier>DOI: 10.1002/cem.1192</identifier><identifier>CODEN: JOCHEU</identifier><language>eng</language><publisher>Chichester, UK: John Wiley & Sons, Ltd</publisher><subject>Calibration ; Chemistry ; chemometrics ; Comparative analysis ; Exact sciences and technology ; General and physical chemistry ; General. Nomenclature, chemical documentation, computer chemistry ; informative vectors ; Multivariate analysis ; OPS ; partial least squares ; Regression analysis ; Theory of reactions, general kinetics. Catalysis. Nomenclature, chemical documentation, computer chemistry ; variable selection</subject><ispartof>Journal of chemometrics, 2009-01, Vol.23 (1), p.32-48</ispartof><rights>Copyright © 2008 John Wiley & Sons, Ltd.</rights><rights>2009 INIST-CNRS</rights><rights>Copyright John Wiley and Sons, Limited Jan 2009</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c5202-d6e0dfb9c7de3cdec73bee3f865a92cfeeec6ccf59f08a4f95e0d5e3632554b33</citedby><cites>FETCH-LOGICAL-c5202-d6e0dfb9c7de3cdec73bee3f865a92cfeeec6ccf59f08a4f95e0d5e3632554b33</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://onlinelibrary.wiley.com/doi/pdf/10.1002%2Fcem.1192$$EPDF$$P50$$Gwiley$$H</linktopdf><linktohtml>$$Uhttps://onlinelibrary.wiley.com/doi/full/10.1002%2Fcem.1192$$EHTML$$P50$$Gwiley$$H</linktohtml><link.rule.ids>314,777,781,1412,27905,27906,45555,45556</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=21162142$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><creatorcontrib>Teófilo, Reinaldo F.</creatorcontrib><creatorcontrib>Martins, João Paulo A.</creatorcontrib><creatorcontrib>Ferreira, Márcia M. C.</creatorcontrib><title>Sorting variables by using informative vectors as a strategy for feature selection in multivariate regression</title><title>Journal of chemometrics</title><addtitle>J. Chemometrics</addtitle><description>A new procedure with high ability to enhance prediction of multivariate calibration models with a small number of interpretable variables is presented. The core of this methodology is to sort the variables from an informative vector, followed by a systematic investigation of PLS regression models with the aim of finding the most relevant set of variables by comparing the cross‐validation parameters of the models obtained. In this work, seven main informative vectors i.e. regression vector, correlation vector, residual vector, variable influence on projection (VIP), net analyte signal (NAS), covariance procedures vector (CovProc), signal‐to‐noise ratios vector (StN) and their combinations were automated and tested with the main purpose of feature selection. Six data sets from different sources were employed to validate this methodology. They originated from: near‐Infrared (NIR) spectroscopy, Raman spectroscopy, gas chromatography (GC), fluorescence spectroscopy, quantitative structure‐activity relationships (QSAR) and computer simulation. The results indicate that all vectors and their combinations were able to enhance prediction capability with respect to the full data sets. However, regression and NAS informative vectors from partial least squares (PLS) regression, both built using more latent variables than when building the model presented in most of tested data sets, were the best informative vectors for variable selection. In all the applications, the selected variables were quite effective and useful for interpretation. Copyright © 2008 John Wiley & Sons, Ltd.
A new variable selection procedure to enhance prediction of multivariate calibration models is presented. The methodology sorts the variables from an informative vector, investigates systematically PLS regression models and finds the most relevant set of variables by comparing the cross‐validation parameters of the models obtained. Seven informative vectors and their combinations were successfully tested for data sets from different applications.</description><subject>Calibration</subject><subject>Chemistry</subject><subject>chemometrics</subject><subject>Comparative analysis</subject><subject>Exact sciences and technology</subject><subject>General and physical chemistry</subject><subject>General. Nomenclature, chemical documentation, computer chemistry</subject><subject>informative vectors</subject><subject>Multivariate analysis</subject><subject>OPS</subject><subject>partial least squares</subject><subject>Regression analysis</subject><subject>Theory of reactions, general kinetics. Catalysis. Nomenclature, chemical documentation, computer chemistry</subject><subject>variable selection</subject><issn>0886-9383</issn><issn>1099-128X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2009</creationdate><recordtype>article</recordtype><recordid>eNqF0VtrFDEUB_AgFlyr4EcIgpeXqblMZiaPstRW6LaIiuJLyGRPltS5tDmZtfvtm2GXPghVCARyfucPJ4eQV5ydcMbEBwf9CedaPCELzrQuuGh-PiUL1jRVoWUjn5HniNeM5ZosF6T_OsYUhg3d2hhs2wHSdkcnnJ_C4MfY2xS2QLfg0hiR2nwopmgTbHY016kHm6YIFKHLJoxD7qP91OW2OTIBjbCJgJhLL8iRtx3Cy8N9TL5_Ov22PC8urs4-Lz9eFE4JJop1BWztW-3qNUi3BlfLFkD6plJWC-cBwFXOeaU9a2zptcpegaykUKpspTwm7_a5N3G8nQCT6QM66Do7wDih0UxWZVXWLMu3_5SyLHNqU_8fSp3_W82Jr_-C1-MUhzyuEYILUWeT0fs9cnFEjODNTQy9jTvDmZn3aPIezbzHTN8c8iw62_loBxfwwQvOK8HL2RV79yd0sHs0zyxPV4fcgw-Y4O7B2_jbVLWslflxeWbUqvl1ufoiDZf3Bg281w</recordid><startdate>200901</startdate><enddate>200901</enddate><creator>Teófilo, Reinaldo F.</creator><creator>Martins, João Paulo A.</creator><creator>Ferreira, Márcia M. C.</creator><general>John Wiley & Sons, Ltd</general><general>Wiley</general><general>Wiley Subscription Services, Inc</general><scope>BSCLL</scope><scope>IQODW</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7U5</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>200901</creationdate><title>Sorting variables by using informative vectors as a strategy for feature selection in multivariate regression</title><author>Teófilo, Reinaldo F. ; Martins, João Paulo A. ; Ferreira, Márcia M. C.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c5202-d6e0dfb9c7de3cdec73bee3f865a92cfeeec6ccf59f08a4f95e0d5e3632554b33</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2009</creationdate><topic>Calibration</topic><topic>Chemistry</topic><topic>chemometrics</topic><topic>Comparative analysis</topic><topic>Exact sciences and technology</topic><topic>General and physical chemistry</topic><topic>General. Nomenclature, chemical documentation, computer chemistry</topic><topic>informative vectors</topic><topic>Multivariate analysis</topic><topic>OPS</topic><topic>partial least squares</topic><topic>Regression analysis</topic><topic>Theory of reactions, general kinetics. Catalysis. Nomenclature, chemical documentation, computer chemistry</topic><topic>variable selection</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Teófilo, Reinaldo F.</creatorcontrib><creatorcontrib>Martins, João Paulo A.</creatorcontrib><creatorcontrib>Ferreira, Márcia M. C.</creatorcontrib><collection>Istex</collection><collection>Pascal-Francis</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Journal of chemometrics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Teófilo, Reinaldo F.</au><au>Martins, João Paulo A.</au><au>Ferreira, Márcia M. C.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Sorting variables by using informative vectors as a strategy for feature selection in multivariate regression</atitle><jtitle>Journal of chemometrics</jtitle><addtitle>J. Chemometrics</addtitle><date>2009-01</date><risdate>2009</risdate><volume>23</volume><issue>1</issue><spage>32</spage><epage>48</epage><pages>32-48</pages><issn>0886-9383</issn><eissn>1099-128X</eissn><coden>JOCHEU</coden><abstract>A new procedure with high ability to enhance prediction of multivariate calibration models with a small number of interpretable variables is presented. The core of this methodology is to sort the variables from an informative vector, followed by a systematic investigation of PLS regression models with the aim of finding the most relevant set of variables by comparing the cross‐validation parameters of the models obtained. In this work, seven main informative vectors i.e. regression vector, correlation vector, residual vector, variable influence on projection (VIP), net analyte signal (NAS), covariance procedures vector (CovProc), signal‐to‐noise ratios vector (StN) and their combinations were automated and tested with the main purpose of feature selection. Six data sets from different sources were employed to validate this methodology. They originated from: near‐Infrared (NIR) spectroscopy, Raman spectroscopy, gas chromatography (GC), fluorescence spectroscopy, quantitative structure‐activity relationships (QSAR) and computer simulation. The results indicate that all vectors and their combinations were able to enhance prediction capability with respect to the full data sets. However, regression and NAS informative vectors from partial least squares (PLS) regression, both built using more latent variables than when building the model presented in most of tested data sets, were the best informative vectors for variable selection. In all the applications, the selected variables were quite effective and useful for interpretation. Copyright © 2008 John Wiley & Sons, Ltd.
A new variable selection procedure to enhance prediction of multivariate calibration models is presented. The methodology sorts the variables from an informative vector, investigates systematically PLS regression models and finds the most relevant set of variables by comparing the cross‐validation parameters of the models obtained. Seven informative vectors and their combinations were successfully tested for data sets from different applications.</abstract><cop>Chichester, UK</cop><pub>John Wiley & Sons, Ltd</pub><doi>10.1002/cem.1192</doi><tpages>17</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0886-9383 |
ispartof | Journal of chemometrics, 2009-01, Vol.23 (1), p.32-48 |
issn | 0886-9383 1099-128X |
language | eng |
recordid | cdi_proquest_miscellaneous_903646470 |
source | Wiley Online Library Journals Frontfile Complete |
subjects | Calibration Chemistry chemometrics Comparative analysis Exact sciences and technology General and physical chemistry General. Nomenclature, chemical documentation, computer chemistry informative vectors Multivariate analysis OPS partial least squares Regression analysis Theory of reactions, general kinetics. Catalysis. Nomenclature, chemical documentation, computer chemistry variable selection |
title | Sorting variables by using informative vectors as a strategy for feature selection in multivariate regression |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-18T12%3A37%3A14IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Sorting%20variables%20by%20using%20informative%20vectors%20as%20a%20strategy%20for%20feature%20selection%20in%20multivariate%20regression&rft.jtitle=Journal%20of%20chemometrics&rft.au=Te%C3%B3filo,%20Reinaldo%20F.&rft.date=2009-01&rft.volume=23&rft.issue=1&rft.spage=32&rft.epage=48&rft.pages=32-48&rft.issn=0886-9383&rft.eissn=1099-128X&rft.coden=JOCHEU&rft_id=info:doi/10.1002/cem.1192&rft_dat=%3Cproquest_cross%3E33910950%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=221227503&rft_id=info:pmid/&rfr_iscdi=true |