Comparison of variable selection methods in partial least squares regression

Through the remarkable progress in technology, it is getting easier and easier to generate vast amounts of variables from a given sample. The selection of variables is imperative for data reduction and for understanding the modeled relationship. Partial least squares (PLS) regression is among the mo...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of chemometrics 2020-06, Vol.34 (6), p.n/a
Hauptverfasser: Mehmood, Tahir, Sæbø, Solve, Liland, Kristian Hovde
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page n/a
container_issue 6
container_start_page
container_title Journal of chemometrics
container_volume 34
creator Mehmood, Tahir
Sæbø, Solve
Liland, Kristian Hovde
description Through the remarkable progress in technology, it is getting easier and easier to generate vast amounts of variables from a given sample. The selection of variables is imperative for data reduction and for understanding the modeled relationship. Partial least squares (PLS) regression is among the modeling approaches that address high throughput data. A considerable list of variable selection methods has been introduced in PLS. Most of these methods have been reviewed in a recently conducted study. Motivated by this, we have therefore conducted a comparison of available methods for variable selection within PLS. The main focus of this study was to reveal patterns of dependencies between variable selection method and data properties, which can guide the choice of method in practical data analysis. To this aim, a simulation study was conducted with data sets having diverse properties like the number of variables, the number of samples, model complexity level, and information content. The results indicate that the above factors like the number of variables, number of samples, model complexity level, information content and variant of PLS methods, and their mutual higher‐order interactions all significantly define the prediction capabilities of the model and the choice of variable selection strategy. Variable selection methods can be divided in into three groups: filter, wrapper, and embedded. The comparison of variable selection methods in partial least squares (PLS) is conducted based on simulated data sets of diverse characteristics. For comparison, root mean square error is mainly used, and a meta‐analysis is carried out. Article provides the link between data properties and variable selection methods. Moreover, the characteristics of variable selection methods are explored.
doi_str_mv 10.1002/cem.3226
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2408869035</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2408869035</sourcerecordid><originalsourceid>FETCH-LOGICAL-c3936-7711fe5a8ae0ef80a370ecc5c8245c5451faf3ba0639c850611e4b664096b2933</originalsourceid><addsrcrecordid>eNp10M1KAzEUBeAgCtYq-AgBN26m3kwmabKUof5AxY2Cu5CJNzplpmmTGaVvb2rdujqX3I-EHEIuGcwYQHnjsJ_xspRHZMJA64KV6u2YTEApWWiu-Ck5S2kFkHe8mpBlHfqNjW0Kaxo8_cqjbTqkCTt0Q5tPexw-w3ui7ZpmOLS2ox3aNNC0HW3ERCN-5EjZnpMTb7uEF385Ja93i5f6oVg-3z_Wt8vCcc1lMZ8z5lFYZRHQK7B8DuiccKqshBOVYN563liQXDslQDKGVSNlBVo2peZ8Sq4O925i2I6YBrMKY1znJ01Z7X-qgYusrg_KxZBSRG82se1t3BkGZt-VyV2ZfVeZFgf63Xa4-9eZevH0638AdmxqVA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2408869035</pqid></control><display><type>article</type><title>Comparison of variable selection methods in partial least squares regression</title><source>Wiley Online Library Journals Frontfile Complete</source><creator>Mehmood, Tahir ; Sæbø, Solve ; Liland, Kristian Hovde</creator><creatorcontrib>Mehmood, Tahir ; Sæbø, Solve ; Liland, Kristian Hovde</creatorcontrib><description>Through the remarkable progress in technology, it is getting easier and easier to generate vast amounts of variables from a given sample. The selection of variables is imperative for data reduction and for understanding the modeled relationship. Partial least squares (PLS) regression is among the modeling approaches that address high throughput data. A considerable list of variable selection methods has been introduced in PLS. Most of these methods have been reviewed in a recently conducted study. Motivated by this, we have therefore conducted a comparison of available methods for variable selection within PLS. The main focus of this study was to reveal patterns of dependencies between variable selection method and data properties, which can guide the choice of method in practical data analysis. To this aim, a simulation study was conducted with data sets having diverse properties like the number of variables, the number of samples, model complexity level, and information content. The results indicate that the above factors like the number of variables, number of samples, model complexity level, information content and variant of PLS methods, and their mutual higher‐order interactions all significantly define the prediction capabilities of the model and the choice of variable selection strategy. Variable selection methods can be divided in into three groups: filter, wrapper, and embedded. The comparison of variable selection methods in partial least squares (PLS) is conducted based on simulated data sets of diverse characteristics. For comparison, root mean square error is mainly used, and a meta‐analysis is carried out. Article provides the link between data properties and variable selection methods. Moreover, the characteristics of variable selection methods are explored.</description><identifier>ISSN: 0886-9383</identifier><identifier>EISSN: 1099-128X</identifier><identifier>DOI: 10.1002/cem.3226</identifier><language>eng</language><publisher>Chichester: Wiley Subscription Services, Inc</publisher><subject>Complexity ; Computer simulation ; Data analysis ; Data reduction ; Least squares method ; PLS; variable selection ; Regression analysis ; Variables</subject><ispartof>Journal of chemometrics, 2020-06, Vol.34 (6), p.n/a</ispartof><rights>2020 The Authors. Journal of Chemometrics published by John Wiley &amp; Sons, Ltd</rights><rights>2020. This article is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c3936-7711fe5a8ae0ef80a370ecc5c8245c5451faf3ba0639c850611e4b664096b2933</citedby><cites>FETCH-LOGICAL-c3936-7711fe5a8ae0ef80a370ecc5c8245c5451faf3ba0639c850611e4b664096b2933</cites><orcidid>0000-0001-6468-9423 ; 0000-0001-9775-8093</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://onlinelibrary.wiley.com/doi/pdf/10.1002%2Fcem.3226$$EPDF$$P50$$Gwiley$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://onlinelibrary.wiley.com/doi/full/10.1002%2Fcem.3226$$EHTML$$P50$$Gwiley$$Hfree_for_read</linktohtml><link.rule.ids>314,776,780,1411,27903,27904,45553,45554</link.rule.ids></links><search><creatorcontrib>Mehmood, Tahir</creatorcontrib><creatorcontrib>Sæbø, Solve</creatorcontrib><creatorcontrib>Liland, Kristian Hovde</creatorcontrib><title>Comparison of variable selection methods in partial least squares regression</title><title>Journal of chemometrics</title><description>Through the remarkable progress in technology, it is getting easier and easier to generate vast amounts of variables from a given sample. The selection of variables is imperative for data reduction and for understanding the modeled relationship. Partial least squares (PLS) regression is among the modeling approaches that address high throughput data. A considerable list of variable selection methods has been introduced in PLS. Most of these methods have been reviewed in a recently conducted study. Motivated by this, we have therefore conducted a comparison of available methods for variable selection within PLS. The main focus of this study was to reveal patterns of dependencies between variable selection method and data properties, which can guide the choice of method in practical data analysis. To this aim, a simulation study was conducted with data sets having diverse properties like the number of variables, the number of samples, model complexity level, and information content. The results indicate that the above factors like the number of variables, number of samples, model complexity level, information content and variant of PLS methods, and their mutual higher‐order interactions all significantly define the prediction capabilities of the model and the choice of variable selection strategy. Variable selection methods can be divided in into three groups: filter, wrapper, and embedded. The comparison of variable selection methods in partial least squares (PLS) is conducted based on simulated data sets of diverse characteristics. For comparison, root mean square error is mainly used, and a meta‐analysis is carried out. Article provides the link between data properties and variable selection methods. Moreover, the characteristics of variable selection methods are explored.</description><subject>Complexity</subject><subject>Computer simulation</subject><subject>Data analysis</subject><subject>Data reduction</subject><subject>Least squares method</subject><subject>PLS; variable selection</subject><subject>Regression analysis</subject><subject>Variables</subject><issn>0886-9383</issn><issn>1099-128X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>24P</sourceid><sourceid>WIN</sourceid><recordid>eNp10M1KAzEUBeAgCtYq-AgBN26m3kwmabKUof5AxY2Cu5CJNzplpmmTGaVvb2rdujqX3I-EHEIuGcwYQHnjsJ_xspRHZMJA64KV6u2YTEApWWiu-Ck5S2kFkHe8mpBlHfqNjW0Kaxo8_cqjbTqkCTt0Q5tPexw-w3ui7ZpmOLS2ox3aNNC0HW3ERCN-5EjZnpMTb7uEF385Ja93i5f6oVg-3z_Wt8vCcc1lMZ8z5lFYZRHQK7B8DuiccKqshBOVYN563liQXDslQDKGVSNlBVo2peZ8Sq4O925i2I6YBrMKY1znJ01Z7X-qgYusrg_KxZBSRG82se1t3BkGZt-VyV2ZfVeZFgf63Xa4-9eZevH0638AdmxqVA</recordid><startdate>202006</startdate><enddate>202006</enddate><creator>Mehmood, Tahir</creator><creator>Sæbø, Solve</creator><creator>Liland, Kristian Hovde</creator><general>Wiley Subscription Services, Inc</general><scope>24P</scope><scope>WIN</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7U5</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0001-6468-9423</orcidid><orcidid>https://orcid.org/0000-0001-9775-8093</orcidid></search><sort><creationdate>202006</creationdate><title>Comparison of variable selection methods in partial least squares regression</title><author>Mehmood, Tahir ; Sæbø, Solve ; Liland, Kristian Hovde</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c3936-7711fe5a8ae0ef80a370ecc5c8245c5451faf3ba0639c850611e4b664096b2933</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Complexity</topic><topic>Computer simulation</topic><topic>Data analysis</topic><topic>Data reduction</topic><topic>Least squares method</topic><topic>PLS; variable selection</topic><topic>Regression analysis</topic><topic>Variables</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Mehmood, Tahir</creatorcontrib><creatorcontrib>Sæbø, Solve</creatorcontrib><creatorcontrib>Liland, Kristian Hovde</creatorcontrib><collection>Wiley Online Library Open Access</collection><collection>Wiley Free Content</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Journal of chemometrics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Mehmood, Tahir</au><au>Sæbø, Solve</au><au>Liland, Kristian Hovde</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Comparison of variable selection methods in partial least squares regression</atitle><jtitle>Journal of chemometrics</jtitle><date>2020-06</date><risdate>2020</risdate><volume>34</volume><issue>6</issue><epage>n/a</epage><issn>0886-9383</issn><eissn>1099-128X</eissn><abstract>Through the remarkable progress in technology, it is getting easier and easier to generate vast amounts of variables from a given sample. The selection of variables is imperative for data reduction and for understanding the modeled relationship. Partial least squares (PLS) regression is among the modeling approaches that address high throughput data. A considerable list of variable selection methods has been introduced in PLS. Most of these methods have been reviewed in a recently conducted study. Motivated by this, we have therefore conducted a comparison of available methods for variable selection within PLS. The main focus of this study was to reveal patterns of dependencies between variable selection method and data properties, which can guide the choice of method in practical data analysis. To this aim, a simulation study was conducted with data sets having diverse properties like the number of variables, the number of samples, model complexity level, and information content. The results indicate that the above factors like the number of variables, number of samples, model complexity level, information content and variant of PLS methods, and their mutual higher‐order interactions all significantly define the prediction capabilities of the model and the choice of variable selection strategy. Variable selection methods can be divided in into three groups: filter, wrapper, and embedded. The comparison of variable selection methods in partial least squares (PLS) is conducted based on simulated data sets of diverse characteristics. For comparison, root mean square error is mainly used, and a meta‐analysis is carried out. Article provides the link between data properties and variable selection methods. Moreover, the characteristics of variable selection methods are explored.</abstract><cop>Chichester</cop><pub>Wiley Subscription Services, Inc</pub><doi>10.1002/cem.3226</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0001-6468-9423</orcidid><orcidid>https://orcid.org/0000-0001-9775-8093</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0886-9383
ispartof Journal of chemometrics, 2020-06, Vol.34 (6), p.n/a
issn 0886-9383
1099-128X
language eng
recordid cdi_proquest_journals_2408869035
source Wiley Online Library Journals Frontfile Complete
subjects Complexity
Computer simulation
Data analysis
Data reduction
Least squares method
PLS
variable selection
Regression analysis
Variables
title Comparison of variable selection methods in partial least squares regression
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-21T11%3A15%3A17IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Comparison%20of%20variable%20selection%20methods%20in%20partial%20least%20squares%20regression&rft.jtitle=Journal%20of%20chemometrics&rft.au=Mehmood,%20Tahir&rft.date=2020-06&rft.volume=34&rft.issue=6&rft.epage=n/a&rft.issn=0886-9383&rft.eissn=1099-128X&rft_id=info:doi/10.1002/cem.3226&rft_dat=%3Cproquest_cross%3E2408869035%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2408869035&rft_id=info:pmid/&rfr_iscdi=true