MinE-RFE: determine the optimal subset from RFE by minimizing the subset-accuracy–defined energy

Abstract Recursive feature elimination (RFE), as one of the most popular feature selection algorithms, has been extensively applied to bioinformatics. During the training, a group of candidate subsets are generated by iteratively eliminating the least important features from the original features. H...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Briefings in bioinformatics 2020-03, Vol.21 (2), p.687-698
Hauptverfasser: Su, Ran, Liu, Xinyi, Wei, Leyi
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 698
container_issue 2
container_start_page 687
container_title Briefings in bioinformatics
container_volume 21
creator Su, Ran
Liu, Xinyi
Wei, Leyi
description Abstract Recursive feature elimination (RFE), as one of the most popular feature selection algorithms, has been extensively applied to bioinformatics. During the training, a group of candidate subsets are generated by iteratively eliminating the least important features from the original features. However, how to determine the optimal subset from them still remains ambiguous. Among most current studies, either overall accuracy or subset size (SS) is used to select the most predictive features. Using which one or both and how they affect the prediction performance are still open questions. In this study, we proposed MinE-RFE, a novel RFE-based feature selection approach by sufficiently considering the effect of both factors. Subset decision problem was reflected into subset-accuracy space and became an energy-minimization problem. We also provided a mathematical description of the relationship between the overall accuracy and SS using Gaussian Mixture Models together with spline fitting. Besides, we comprehensively reviewed a variety of state-of-the-art applications in bioinformatics using RFE. We compared their approaches of deciding the final subset from all the candidate subsets with MinE-RFE on diverse bioinformatics data sets. Additionally, we also compared MinE-RFE with some well-used feature selection algorithms. The comparative results demonstrate that the proposed approach exhibits the best performance among all the approaches. To facilitate the use of MinE-RFE, we further established a user-friendly web server with the implementation of the proposed approach, which is accessible at http://qgking.wicp.net/MinE/. We expect this web server will be a useful tool for research community.
doi_str_mv 10.1093/bib/bbz021
format Article
fullrecord <record><control><sourceid>proquest_TOX</sourceid><recordid>TN_cdi_proquest_miscellaneous_2190492408</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><oup_id>10.1093/bib/bbz021</oup_id><sourcerecordid>2429010922</sourcerecordid><originalsourceid>FETCH-LOGICAL-c345t-9a37cccefbc46bd9a31cae145727990742bc8a48c3c91ae28912beb76737f9b3</originalsourceid><addsrcrecordid>eNp90E9LwzAcxvEgipvTiy9AAiKIUJek6dJ4k7GpoAiye0nSX2dH_8ykPXQn34Pv0FditFPEg6ck8OEhfBE6puSSEhmOda7HWm8IoztoSLkQAScR3_11H6AD51aEMCJiuo8GIYknJBJ0iPRDXs2Cp_nsCqfQgC3zCnDzDLheN3mpCuxa7aDBma1L7BnWHfYmL_NNXi2_ZC8CZUxrleneX99SyPxMiqECu-wO0V6mCgdH23OEFvPZYnob3D_e3E2v7wMT8qgJpAqFMQYybfhEp_5JjQLKI8GElERwpk2seGxCI6kCFkvKNGgxEaHIpA5H6LyfXdv6pQXXJGXuDBSFqqBuXcKoJFwyTmJPT__QVd3ayn8uYZxJ4qsy5tVFr4ytnbOQJWvrk9guoST5DJ_48Ekf3uOT7WSrS0h_6HdpD856ULfr_4Y-AMNSjBU</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2429010922</pqid></control><display><type>article</type><title>MinE-RFE: determine the optimal subset from RFE by minimizing the subset-accuracy–defined energy</title><source>Oxford Journals Open Access Collection</source><creator>Su, Ran ; Liu, Xinyi ; Wei, Leyi</creator><creatorcontrib>Su, Ran ; Liu, Xinyi ; Wei, Leyi</creatorcontrib><description>Abstract Recursive feature elimination (RFE), as one of the most popular feature selection algorithms, has been extensively applied to bioinformatics. During the training, a group of candidate subsets are generated by iteratively eliminating the least important features from the original features. However, how to determine the optimal subset from them still remains ambiguous. Among most current studies, either overall accuracy or subset size (SS) is used to select the most predictive features. Using which one or both and how they affect the prediction performance are still open questions. In this study, we proposed MinE-RFE, a novel RFE-based feature selection approach by sufficiently considering the effect of both factors. Subset decision problem was reflected into subset-accuracy space and became an energy-minimization problem. We also provided a mathematical description of the relationship between the overall accuracy and SS using Gaussian Mixture Models together with spline fitting. Besides, we comprehensively reviewed a variety of state-of-the-art applications in bioinformatics using RFE. We compared their approaches of deciding the final subset from all the candidate subsets with MinE-RFE on diverse bioinformatics data sets. Additionally, we also compared MinE-RFE with some well-used feature selection algorithms. The comparative results demonstrate that the proposed approach exhibits the best performance among all the approaches. To facilitate the use of MinE-RFE, we further established a user-friendly web server with the implementation of the proposed approach, which is accessible at http://qgking.wicp.net/MinE/. We expect this web server will be a useful tool for research community.</description><identifier>ISSN: 1477-4054</identifier><identifier>ISSN: 1467-5463</identifier><identifier>EISSN: 1477-4054</identifier><identifier>DOI: 10.1093/bib/bbz021</identifier><identifier>PMID: 30860571</identifier><language>eng</language><publisher>England: Oxford University Press</publisher><subject>Accuracy ; Algorithms ; Bioinformatics ; Energy conservation ; Feature selection ; Internet ; Mathematical models ; Model accuracy ; Optimization ; Probabilistic models ; Servers</subject><ispartof>Briefings in bioinformatics, 2020-03, Vol.21 (2), p.687-698</ispartof><rights>The Author(s) 2019. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com 2019</rights><rights>The Author(s) 2019. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.</rights><rights>The Author(s) 2019. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c345t-9a37cccefbc46bd9a31cae145727990742bc8a48c3c91ae28912beb76737f9b3</citedby><cites>FETCH-LOGICAL-c345t-9a37cccefbc46bd9a31cae145727990742bc8a48c3c91ae28912beb76737f9b3</cites><orcidid>0000-0003-1444-190X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,1598,27901,27902</link.rule.ids><linktorsrc>$$Uhttps://dx.doi.org/10.1093/bib/bbz021$$EView_record_in_Oxford_University_Press$$FView_record_in_$$GOxford_University_Press</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/30860571$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Su, Ran</creatorcontrib><creatorcontrib>Liu, Xinyi</creatorcontrib><creatorcontrib>Wei, Leyi</creatorcontrib><title>MinE-RFE: determine the optimal subset from RFE by minimizing the subset-accuracy–defined energy</title><title>Briefings in bioinformatics</title><addtitle>Brief Bioinform</addtitle><description>Abstract Recursive feature elimination (RFE), as one of the most popular feature selection algorithms, has been extensively applied to bioinformatics. During the training, a group of candidate subsets are generated by iteratively eliminating the least important features from the original features. However, how to determine the optimal subset from them still remains ambiguous. Among most current studies, either overall accuracy or subset size (SS) is used to select the most predictive features. Using which one or both and how they affect the prediction performance are still open questions. In this study, we proposed MinE-RFE, a novel RFE-based feature selection approach by sufficiently considering the effect of both factors. Subset decision problem was reflected into subset-accuracy space and became an energy-minimization problem. We also provided a mathematical description of the relationship between the overall accuracy and SS using Gaussian Mixture Models together with spline fitting. Besides, we comprehensively reviewed a variety of state-of-the-art applications in bioinformatics using RFE. We compared their approaches of deciding the final subset from all the candidate subsets with MinE-RFE on diverse bioinformatics data sets. Additionally, we also compared MinE-RFE with some well-used feature selection algorithms. The comparative results demonstrate that the proposed approach exhibits the best performance among all the approaches. To facilitate the use of MinE-RFE, we further established a user-friendly web server with the implementation of the proposed approach, which is accessible at http://qgking.wicp.net/MinE/. We expect this web server will be a useful tool for research community.</description><subject>Accuracy</subject><subject>Algorithms</subject><subject>Bioinformatics</subject><subject>Energy conservation</subject><subject>Feature selection</subject><subject>Internet</subject><subject>Mathematical models</subject><subject>Model accuracy</subject><subject>Optimization</subject><subject>Probabilistic models</subject><subject>Servers</subject><issn>1477-4054</issn><issn>1467-5463</issn><issn>1477-4054</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><recordid>eNp90E9LwzAcxvEgipvTiy9AAiKIUJek6dJ4k7GpoAiye0nSX2dH_8ykPXQn34Pv0FditFPEg6ck8OEhfBE6puSSEhmOda7HWm8IoztoSLkQAScR3_11H6AD51aEMCJiuo8GIYknJBJ0iPRDXs2Cp_nsCqfQgC3zCnDzDLheN3mpCuxa7aDBma1L7BnWHfYmL_NNXi2_ZC8CZUxrleneX99SyPxMiqECu-wO0V6mCgdH23OEFvPZYnob3D_e3E2v7wMT8qgJpAqFMQYybfhEp_5JjQLKI8GElERwpk2seGxCI6kCFkvKNGgxEaHIpA5H6LyfXdv6pQXXJGXuDBSFqqBuXcKoJFwyTmJPT__QVd3ayn8uYZxJ4qsy5tVFr4ytnbOQJWvrk9guoST5DJ_48Ekf3uOT7WSrS0h_6HdpD856ULfr_4Y-AMNSjBU</recordid><startdate>20200323</startdate><enddate>20200323</enddate><creator>Su, Ran</creator><creator>Liu, Xinyi</creator><creator>Wei, Leyi</creator><general>Oxford University Press</general><general>Oxford Publishing Limited (England)</general><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QO</scope><scope>7SC</scope><scope>8FD</scope><scope>FR3</scope><scope>JQ2</scope><scope>K9.</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>P64</scope><scope>RC3</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0003-1444-190X</orcidid></search><sort><creationdate>20200323</creationdate><title>MinE-RFE: determine the optimal subset from RFE by minimizing the subset-accuracy–defined energy</title><author>Su, Ran ; Liu, Xinyi ; Wei, Leyi</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c345t-9a37cccefbc46bd9a31cae145727990742bc8a48c3c91ae28912beb76737f9b3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Accuracy</topic><topic>Algorithms</topic><topic>Bioinformatics</topic><topic>Energy conservation</topic><topic>Feature selection</topic><topic>Internet</topic><topic>Mathematical models</topic><topic>Model accuracy</topic><topic>Optimization</topic><topic>Probabilistic models</topic><topic>Servers</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Su, Ran</creatorcontrib><creatorcontrib>Liu, Xinyi</creatorcontrib><creatorcontrib>Wei, Leyi</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>Biotechnology Research Abstracts</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><jtitle>Briefings in bioinformatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Su, Ran</au><au>Liu, Xinyi</au><au>Wei, Leyi</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>MinE-RFE: determine the optimal subset from RFE by minimizing the subset-accuracy–defined energy</atitle><jtitle>Briefings in bioinformatics</jtitle><addtitle>Brief Bioinform</addtitle><date>2020-03-23</date><risdate>2020</risdate><volume>21</volume><issue>2</issue><spage>687</spage><epage>698</epage><pages>687-698</pages><issn>1477-4054</issn><issn>1467-5463</issn><eissn>1477-4054</eissn><abstract>Abstract Recursive feature elimination (RFE), as one of the most popular feature selection algorithms, has been extensively applied to bioinformatics. During the training, a group of candidate subsets are generated by iteratively eliminating the least important features from the original features. However, how to determine the optimal subset from them still remains ambiguous. Among most current studies, either overall accuracy or subset size (SS) is used to select the most predictive features. Using which one or both and how they affect the prediction performance are still open questions. In this study, we proposed MinE-RFE, a novel RFE-based feature selection approach by sufficiently considering the effect of both factors. Subset decision problem was reflected into subset-accuracy space and became an energy-minimization problem. We also provided a mathematical description of the relationship between the overall accuracy and SS using Gaussian Mixture Models together with spline fitting. Besides, we comprehensively reviewed a variety of state-of-the-art applications in bioinformatics using RFE. We compared their approaches of deciding the final subset from all the candidate subsets with MinE-RFE on diverse bioinformatics data sets. Additionally, we also compared MinE-RFE with some well-used feature selection algorithms. The comparative results demonstrate that the proposed approach exhibits the best performance among all the approaches. To facilitate the use of MinE-RFE, we further established a user-friendly web server with the implementation of the proposed approach, which is accessible at http://qgking.wicp.net/MinE/. We expect this web server will be a useful tool for research community.</abstract><cop>England</cop><pub>Oxford University Press</pub><pmid>30860571</pmid><doi>10.1093/bib/bbz021</doi><tpages>12</tpages><orcidid>https://orcid.org/0000-0003-1444-190X</orcidid></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1477-4054
ispartof Briefings in bioinformatics, 2020-03, Vol.21 (2), p.687-698
issn 1477-4054
1467-5463
1477-4054
language eng
recordid cdi_proquest_miscellaneous_2190492408
source Oxford Journals Open Access Collection
subjects Accuracy
Algorithms
Bioinformatics
Energy conservation
Feature selection
Internet
Mathematical models
Model accuracy
Optimization
Probabilistic models
Servers
title MinE-RFE: determine the optimal subset from RFE by minimizing the subset-accuracy–defined energy
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-10T16%3A06%3A35IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_TOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=MinE-RFE:%20determine%20the%20optimal%20subset%20from%20RFE%20by%20minimizing%20the%20subset-accuracy%E2%80%93defined%20energy&rft.jtitle=Briefings%20in%20bioinformatics&rft.au=Su,%20Ran&rft.date=2020-03-23&rft.volume=21&rft.issue=2&rft.spage=687&rft.epage=698&rft.pages=687-698&rft.issn=1477-4054&rft.eissn=1477-4054&rft_id=info:doi/10.1093/bib/bbz021&rft_dat=%3Cproquest_TOX%3E2429010922%3C/proquest_TOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2429010922&rft_id=info:pmid/30860571&rft_oup_id=10.1093/bib/bbz021&rfr_iscdi=true