Tuning parameter identification for variable selection algorithm using the sum of ranking differences algorithm

Variable selection algorithms are often adopted to select the optimal variable from a full set of variables and are efficient for reducing the variable dimension and improving the model accuracy. Nonetheless, the parameters of the variable selection method and regression model, such as the number of...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Journal of chemometrics 2019-04, Vol.33 (4), p.n/a
Hauptverfasser:	Nie, Mingpeng, Meng, Liuwei, Chen, Xiaojing, Hu, Xinyu, Li, Limin, Yuan, Leimin, Shi, Wen
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Corn Economic models Forecasting Least squares Model accuracy Parameter identification Ranking Regression models sum of ranking differences tuning parameters uninformative variable elimination variable importance in projection variable selection
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	n/a
container_issue	4
container_start_page
container_title	Journal of chemometrics
container_volume	33
creator	Nie, Mingpeng Meng, Liuwei Chen, Xiaojing Hu, Xinyu Li, Limin Yuan, Leimin Shi, Wen
description	Variable selection algorithms are often adopted to select the optimal variable from a full set of variables and are efficient for reducing the variable dimension and improving the model accuracy. Nonetheless, the parameters of the variable selection method and regression model, such as the number of latent variables of the partial least squares (PLS) model and the threshold value of the variable importance index, need to be identified. The parameters directly determine the final performance of the model. Currently, these parameters are often determined subjectively. As a result, the model results may be accidental because of the subjective determination of the parameters. To objectively identify these parameters, the sum of ranking differences (SRD) coupled with partial least squares‐variable importance in projection (PLS‐VIP‐SRD) and partial least squares‐uninformative variable elimination (PLS‐UVE‐SRD) algorithms was applied to determine the latent variable of the PLS model and the threshold value of the variable importance index. Furthermore, public near‐infrared data of corn were used as the calculation data. The final results show that the PLS‐VIP‐SRD and PLS‐UVE‐SRD models can more effectively and objectively determine the optimal parameter combination than the PLS‐VIP and PLS‐UVE models. Moreover, the selected variables are easier to interpret, and the prediction accuracy is also improved to some extent. The sum of ranking differences (SRD) coupled with partial least squares‐variable importance in projection (PLS‐VIP‐SRD) and partial least squares‐uninformative variable elimination (PLS‐UVE‐SRD) algorithms was applied to objectively identify the latent variable of the PLS model and the threshold value of VIP and UVE. And the final results show that the PLS‐VIP (UVE) ‐SRD model can more effectively and objectively determine the optimal parameter. Moreover, the selected variables are easier to interpret, and the prediction accuracy is significantly improved.
doi_str_mv	10.1002/cem.3113
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2209793439</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2209793439</sourcerecordid><originalsourceid>FETCH-LOGICAL-c2933-46b198475541dabda90f7faf5be98d4c31c21b443b8db0672634f593d34779a33</originalsourceid><addsrcrecordid>eNp1kEtLAzEUhYMoWKvgTwi4cTM1r3lkKaU-oOKmgruQydy0qTOTmswo_ffOtIIrVxfO-e69nIPQNSUzSgi7M9DMOKX8BE0okTKhrHg_RRNSFFkiecHP0UWMW0IGj4sJ8qu-de0a73TQDXQQsKug7Zx1RnfOt9j6gL90cLqsAUeowRxkXa99cN2mwX0c97vN4PYN9hYH3X6MUuWshQCtgfiHX6Izq-sIV79zit4eFqv5U7J8fXye3y8TwyTnichKKguRp6mglS4rLYnNrbZpCbKohOHUMFoKwcuiKkmWs4wLm0pecZHnUnM-RTfHu7vgP3uIndr6PrTDS8UYkfkQnsuBuj1SJvgYA1i1C67RYa8oUWOdaqhTjXUOaHJEv10N-385NV-8HPgfIeB3vQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2209793439</pqid></control><display><type>article</type><title>Tuning parameter identification for variable selection algorithm using the sum of ranking differences algorithm</title><source>Wiley Online Library Journals Frontfile Complete</source><creator>Nie, Mingpeng ; Meng, Liuwei ; Chen, Xiaojing ; Hu, Xinyu ; Li, Limin ; Yuan, Leimin ; Shi, Wen</creator><creatorcontrib>Nie, Mingpeng ; Meng, Liuwei ; Chen, Xiaojing ; Hu, Xinyu ; Li, Limin ; Yuan, Leimin ; Shi, Wen</creatorcontrib><description>Variable selection algorithms are often adopted to select the optimal variable from a full set of variables and are efficient for reducing the variable dimension and improving the model accuracy. Nonetheless, the parameters of the variable selection method and regression model, such as the number of latent variables of the partial least squares (PLS) model and the threshold value of the variable importance index, need to be identified. The parameters directly determine the final performance of the model. Currently, these parameters are often determined subjectively. As a result, the model results may be accidental because of the subjective determination of the parameters. To objectively identify these parameters, the sum of ranking differences (SRD) coupled with partial least squares‐variable importance in projection (PLS‐VIP‐SRD) and partial least squares‐uninformative variable elimination (PLS‐UVE‐SRD) algorithms was applied to determine the latent variable of the PLS model and the threshold value of the variable importance index. Furthermore, public near‐infrared data of corn were used as the calculation data. The final results show that the PLS‐VIP‐SRD and PLS‐UVE‐SRD models can more effectively and objectively determine the optimal parameter combination than the PLS‐VIP and PLS‐UVE models. Moreover, the selected variables are easier to interpret, and the prediction accuracy is also improved to some extent. The sum of ranking differences (SRD) coupled with partial least squares‐variable importance in projection (PLS‐VIP‐SRD) and partial least squares‐uninformative variable elimination (PLS‐UVE‐SRD) algorithms was applied to objectively identify the latent variable of the PLS model and the threshold value of VIP and UVE. And the final results show that the PLS‐VIP (UVE) ‐SRD model can more effectively and objectively determine the optimal parameter. Moreover, the selected variables are easier to interpret, and the prediction accuracy is significantly improved.</description><identifier>ISSN: 0886-9383</identifier><identifier>EISSN: 1099-128X</identifier><identifier>DOI: 10.1002/cem.3113</identifier><language>eng</language><publisher>Chichester: Wiley Subscription Services, Inc</publisher><subject>Algorithms ; Corn ; Economic models ; Forecasting ; Least squares ; Model accuracy ; Parameter identification ; Ranking ; Regression models ; sum of ranking differences ; tuning parameters ; uninformative variable elimination ; variable importance in projection ; variable selection</subject><ispartof>Journal of chemometrics, 2019-04, Vol.33 (4), p.n/a</ispartof><rights>2019 John Wiley & Sons, Ltd.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c2933-46b198475541dabda90f7faf5be98d4c31c21b443b8db0672634f593d34779a33</citedby><cites>FETCH-LOGICAL-c2933-46b198475541dabda90f7faf5be98d4c31c21b443b8db0672634f593d34779a33</cites><orcidid>0000-0002-7868-6722</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://onlinelibrary.wiley.com/doi/pdf/10.1002%2Fcem.3113$$EPDF$$P50$$Gwiley$$H</linktopdf><linktohtml>$$Uhttps://onlinelibrary.wiley.com/doi/full/10.1002%2Fcem.3113$$EHTML$$P50$$Gwiley$$H</linktohtml><link.rule.ids>314,776,780,1411,27901,27902,45550,45551</link.rule.ids></links><search><creatorcontrib>Nie, Mingpeng</creatorcontrib><creatorcontrib>Meng, Liuwei</creatorcontrib><creatorcontrib>Chen, Xiaojing</creatorcontrib><creatorcontrib>Hu, Xinyu</creatorcontrib><creatorcontrib>Li, Limin</creatorcontrib><creatorcontrib>Yuan, Leimin</creatorcontrib><creatorcontrib>Shi, Wen</creatorcontrib><title>Tuning parameter identification for variable selection algorithm using the sum of ranking differences algorithm</title><title>Journal of chemometrics</title><description>Variable selection algorithms are often adopted to select the optimal variable from a full set of variables and are efficient for reducing the variable dimension and improving the model accuracy. Nonetheless, the parameters of the variable selection method and regression model, such as the number of latent variables of the partial least squares (PLS) model and the threshold value of the variable importance index, need to be identified. The parameters directly determine the final performance of the model. Currently, these parameters are often determined subjectively. As a result, the model results may be accidental because of the subjective determination of the parameters. To objectively identify these parameters, the sum of ranking differences (SRD) coupled with partial least squares‐variable importance in projection (PLS‐VIP‐SRD) and partial least squares‐uninformative variable elimination (PLS‐UVE‐SRD) algorithms was applied to determine the latent variable of the PLS model and the threshold value of the variable importance index. Furthermore, public near‐infrared data of corn were used as the calculation data. The final results show that the PLS‐VIP‐SRD and PLS‐UVE‐SRD models can more effectively and objectively determine the optimal parameter combination than the PLS‐VIP and PLS‐UVE models. Moreover, the selected variables are easier to interpret, and the prediction accuracy is also improved to some extent. The sum of ranking differences (SRD) coupled with partial least squares‐variable importance in projection (PLS‐VIP‐SRD) and partial least squares‐uninformative variable elimination (PLS‐UVE‐SRD) algorithms was applied to objectively identify the latent variable of the PLS model and the threshold value of VIP and UVE. And the final results show that the PLS‐VIP (UVE) ‐SRD model can more effectively and objectively determine the optimal parameter. Moreover, the selected variables are easier to interpret, and the prediction accuracy is significantly improved.</description><subject>Algorithms</subject><subject>Corn</subject><subject>Economic models</subject><subject>Forecasting</subject><subject>Least squares</subject><subject>Model accuracy</subject><subject>Parameter identification</subject><subject>Ranking</subject><subject>Regression models</subject><subject>sum of ranking differences</subject><subject>tuning parameters</subject><subject>uninformative variable elimination</subject><subject>variable importance in projection</subject><subject>variable selection</subject><issn>0886-9383</issn><issn>1099-128X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><recordid>eNp1kEtLAzEUhYMoWKvgTwi4cTM1r3lkKaU-oOKmgruQydy0qTOTmswo_ffOtIIrVxfO-e69nIPQNSUzSgi7M9DMOKX8BE0okTKhrHg_RRNSFFkiecHP0UWMW0IGj4sJ8qu-de0a73TQDXQQsKug7Zx1RnfOt9j6gL90cLqsAUeowRxkXa99cN2mwX0c97vN4PYN9hYH3X6MUuWshQCtgfiHX6Izq-sIV79zit4eFqv5U7J8fXye3y8TwyTnichKKguRp6mglS4rLYnNrbZpCbKohOHUMFoKwcuiKkmWs4wLm0pecZHnUnM-RTfHu7vgP3uIndr6PrTDS8UYkfkQnsuBuj1SJvgYA1i1C67RYa8oUWOdaqhTjXUOaHJEv10N-385NV-8HPgfIeB3vQ</recordid><startdate>201904</startdate><enddate>201904</enddate><creator>Nie, Mingpeng</creator><creator>Meng, Liuwei</creator><creator>Chen, Xiaojing</creator><creator>Hu, Xinyu</creator><creator>Li, Limin</creator><creator>Yuan, Leimin</creator><creator>Shi, Wen</creator><general>Wiley Subscription Services, Inc</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7U5</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-7868-6722</orcidid></search><sort><creationdate>201904</creationdate><title>Tuning parameter identification for variable selection algorithm using the sum of ranking differences algorithm</title><author>Nie, Mingpeng ; Meng, Liuwei ; Chen, Xiaojing ; Hu, Xinyu ; Li, Limin ; Yuan, Leimin ; Shi, Wen</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c2933-46b198475541dabda90f7faf5be98d4c31c21b443b8db0672634f593d34779a33</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Algorithms</topic><topic>Corn</topic><topic>Economic models</topic><topic>Forecasting</topic><topic>Least squares</topic><topic>Model accuracy</topic><topic>Parameter identification</topic><topic>Ranking</topic><topic>Regression models</topic><topic>sum of ranking differences</topic><topic>tuning parameters</topic><topic>uninformative variable elimination</topic><topic>variable importance in projection</topic><topic>variable selection</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Nie, Mingpeng</creatorcontrib><creatorcontrib>Meng, Liuwei</creatorcontrib><creatorcontrib>Chen, Xiaojing</creatorcontrib><creatorcontrib>Hu, Xinyu</creatorcontrib><creatorcontrib>Li, Limin</creatorcontrib><creatorcontrib>Yuan, Leimin</creatorcontrib><creatorcontrib>Shi, Wen</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Journal of chemometrics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Nie, Mingpeng</au><au>Meng, Liuwei</au><au>Chen, Xiaojing</au><au>Hu, Xinyu</au><au>Li, Limin</au><au>Yuan, Leimin</au><au>Shi, Wen</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Tuning parameter identification for variable selection algorithm using the sum of ranking differences algorithm</atitle><jtitle>Journal of chemometrics</jtitle><date>2019-04</date><risdate>2019</risdate><volume>33</volume><issue>4</issue><epage>n/a</epage><issn>0886-9383</issn><eissn>1099-128X</eissn><abstract>Variable selection algorithms are often adopted to select the optimal variable from a full set of variables and are efficient for reducing the variable dimension and improving the model accuracy. Nonetheless, the parameters of the variable selection method and regression model, such as the number of latent variables of the partial least squares (PLS) model and the threshold value of the variable importance index, need to be identified. The parameters directly determine the final performance of the model. Currently, these parameters are often determined subjectively. As a result, the model results may be accidental because of the subjective determination of the parameters. To objectively identify these parameters, the sum of ranking differences (SRD) coupled with partial least squares‐variable importance in projection (PLS‐VIP‐SRD) and partial least squares‐uninformative variable elimination (PLS‐UVE‐SRD) algorithms was applied to determine the latent variable of the PLS model and the threshold value of the variable importance index. Furthermore, public near‐infrared data of corn were used as the calculation data. The final results show that the PLS‐VIP‐SRD and PLS‐UVE‐SRD models can more effectively and objectively determine the optimal parameter combination than the PLS‐VIP and PLS‐UVE models. Moreover, the selected variables are easier to interpret, and the prediction accuracy is also improved to some extent. The sum of ranking differences (SRD) coupled with partial least squares‐variable importance in projection (PLS‐VIP‐SRD) and partial least squares‐uninformative variable elimination (PLS‐UVE‐SRD) algorithms was applied to objectively identify the latent variable of the PLS model and the threshold value of VIP and UVE. And the final results show that the PLS‐VIP (UVE) ‐SRD model can more effectively and objectively determine the optimal parameter. Moreover, the selected variables are easier to interpret, and the prediction accuracy is significantly improved.</abstract><cop>Chichester</cop><pub>Wiley Subscription Services, Inc</pub><doi>10.1002/cem.3113</doi><tpages>13</tpages><orcidid>https://orcid.org/0000-0002-7868-6722</orcidid></addata></record>
fulltext	fulltext
identifier	ISSN: 0886-9383
ispartof	Journal of chemometrics, 2019-04, Vol.33 (4), p.n/a
issn	0886-9383 1099-128X
language	eng
recordid	cdi_proquest_journals_2209793439
source	Wiley Online Library Journals Frontfile Complete
subjects	Algorithms Corn Economic models Forecasting Least squares Model accuracy Parameter identification Ranking Regression models sum of ranking differences tuning parameters uninformative variable elimination variable importance in projection variable selection
title	Tuning parameter identification for variable selection algorithm using the sum of ranking differences algorithm
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-12T20%3A36%3A45IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Tuning%20parameter%20identification%20for%20variable%20selection%20algorithm%20using%20the%20sum%20of%20ranking%20differences%20algorithm&rft.jtitle=Journal%20of%20chemometrics&rft.au=Nie,%20Mingpeng&rft.date=2019-04&rft.volume=33&rft.issue=4&rft.epage=n/a&rft.issn=0886-9383&rft.eissn=1099-128X&rft_id=info:doi/10.1002/cem.3113&rft_dat=%3Cproquest_cross%3E2209793439%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2209793439&rft_id=info:pmid/&rfr_iscdi=true