MinE-RFE: determine the optimal subset from RFE by minimizing the subset-accuracy–defined energy
Abstract Recursive feature elimination (RFE), as one of the most popular feature selection algorithms, has been extensively applied to bioinformatics. During the training, a group of candidate subsets are generated by iteratively eliminating the least important features from the original features. H...
Gespeichert in:
Veröffentlicht in: | Briefings in bioinformatics 2020-03, Vol.21 (2), p.687-698 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 698 |
---|---|
container_issue | 2 |
container_start_page | 687 |
container_title | Briefings in bioinformatics |
container_volume | 21 |
creator | Su, Ran Liu, Xinyi Wei, Leyi |
description | Abstract
Recursive feature elimination (RFE), as one of the most popular feature selection algorithms, has been extensively applied to bioinformatics. During the training, a group of candidate subsets are generated by iteratively eliminating the least important features from the original features. However, how to determine the optimal subset from them still remains ambiguous. Among most current studies, either overall accuracy or subset size (SS) is used to select the most predictive features. Using which one or both and how they affect the prediction performance are still open questions. In this study, we proposed MinE-RFE, a novel RFE-based feature selection approach by sufficiently considering the effect of both factors. Subset decision problem was reflected into subset-accuracy space and became an energy-minimization problem. We also provided a mathematical description of the relationship between the overall accuracy and SS using Gaussian Mixture Models together with spline fitting. Besides, we comprehensively reviewed a variety of state-of-the-art applications in bioinformatics using RFE. We compared their approaches of deciding the final subset from all the candidate subsets with MinE-RFE on diverse bioinformatics data sets. Additionally, we also compared MinE-RFE with some well-used feature selection algorithms. The comparative results demonstrate that the proposed approach exhibits the best performance among all the approaches. To facilitate the use of MinE-RFE, we further established a user-friendly web server with the implementation of the proposed approach, which is accessible at http://qgking.wicp.net/MinE/. We expect this web server will be a useful tool for research community. |
doi_str_mv | 10.1093/bib/bbz021 |
format | Article |
fullrecord | <record><control><sourceid>proquest_TOX</sourceid><recordid>TN_cdi_proquest_miscellaneous_2190492408</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><oup_id>10.1093/bib/bbz021</oup_id><sourcerecordid>2429010922</sourcerecordid><originalsourceid>FETCH-LOGICAL-c345t-9a37cccefbc46bd9a31cae145727990742bc8a48c3c91ae28912beb76737f9b3</originalsourceid><addsrcrecordid>eNp90E9LwzAcxvEgipvTiy9AAiKIUJek6dJ4k7GpoAiye0nSX2dH_8ykPXQn34Pv0FditFPEg6ck8OEhfBE6puSSEhmOda7HWm8IoztoSLkQAScR3_11H6AD51aEMCJiuo8GIYknJBJ0iPRDXs2Cp_nsCqfQgC3zCnDzDLheN3mpCuxa7aDBma1L7BnWHfYmL_NNXi2_ZC8CZUxrleneX99SyPxMiqECu-wO0V6mCgdH23OEFvPZYnob3D_e3E2v7wMT8qgJpAqFMQYybfhEp_5JjQLKI8GElERwpk2seGxCI6kCFkvKNGgxEaHIpA5H6LyfXdv6pQXXJGXuDBSFqqBuXcKoJFwyTmJPT__QVd3ayn8uYZxJ4qsy5tVFr4ytnbOQJWvrk9guoST5DJ_48Ekf3uOT7WSrS0h_6HdpD856ULfr_4Y-AMNSjBU</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2429010922</pqid></control><display><type>article</type><title>MinE-RFE: determine the optimal subset from RFE by minimizing the subset-accuracy–defined energy</title><source>Oxford Journals Open Access Collection</source><creator>Su, Ran ; Liu, Xinyi ; Wei, Leyi</creator><creatorcontrib>Su, Ran ; Liu, Xinyi ; Wei, Leyi</creatorcontrib><description>Abstract
Recursive feature elimination (RFE), as one of the most popular feature selection algorithms, has been extensively applied to bioinformatics. During the training, a group of candidate subsets are generated by iteratively eliminating the least important features from the original features. However, how to determine the optimal subset from them still remains ambiguous. Among most current studies, either overall accuracy or subset size (SS) is used to select the most predictive features. Using which one or both and how they affect the prediction performance are still open questions. In this study, we proposed MinE-RFE, a novel RFE-based feature selection approach by sufficiently considering the effect of both factors. Subset decision problem was reflected into subset-accuracy space and became an energy-minimization problem. We also provided a mathematical description of the relationship between the overall accuracy and SS using Gaussian Mixture Models together with spline fitting. Besides, we comprehensively reviewed a variety of state-of-the-art applications in bioinformatics using RFE. We compared their approaches of deciding the final subset from all the candidate subsets with MinE-RFE on diverse bioinformatics data sets. Additionally, we also compared MinE-RFE with some well-used feature selection algorithms. The comparative results demonstrate that the proposed approach exhibits the best performance among all the approaches. To facilitate the use of MinE-RFE, we further established a user-friendly web server with the implementation of the proposed approach, which is accessible at http://qgking.wicp.net/MinE/. We expect this web server will be a useful tool for research community.</description><identifier>ISSN: 1477-4054</identifier><identifier>ISSN: 1467-5463</identifier><identifier>EISSN: 1477-4054</identifier><identifier>DOI: 10.1093/bib/bbz021</identifier><identifier>PMID: 30860571</identifier><language>eng</language><publisher>England: Oxford University Press</publisher><subject>Accuracy ; Algorithms ; Bioinformatics ; Energy conservation ; Feature selection ; Internet ; Mathematical models ; Model accuracy ; Optimization ; Probabilistic models ; Servers</subject><ispartof>Briefings in bioinformatics, 2020-03, Vol.21 (2), p.687-698</ispartof><rights>The Author(s) 2019. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com 2019</rights><rights>The Author(s) 2019. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.</rights><rights>The Author(s) 2019. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c345t-9a37cccefbc46bd9a31cae145727990742bc8a48c3c91ae28912beb76737f9b3</citedby><cites>FETCH-LOGICAL-c345t-9a37cccefbc46bd9a31cae145727990742bc8a48c3c91ae28912beb76737f9b3</cites><orcidid>0000-0003-1444-190X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,1598,27901,27902</link.rule.ids><linktorsrc>$$Uhttps://dx.doi.org/10.1093/bib/bbz021$$EView_record_in_Oxford_University_Press$$FView_record_in_$$GOxford_University_Press</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/30860571$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Su, Ran</creatorcontrib><creatorcontrib>Liu, Xinyi</creatorcontrib><creatorcontrib>Wei, Leyi</creatorcontrib><title>MinE-RFE: determine the optimal subset from RFE by minimizing the subset-accuracy–defined energy</title><title>Briefings in bioinformatics</title><addtitle>Brief Bioinform</addtitle><description>Abstract
Recursive feature elimination (RFE), as one of the most popular feature selection algorithms, has been extensively applied to bioinformatics. During the training, a group of candidate subsets are generated by iteratively eliminating the least important features from the original features. However, how to determine the optimal subset from them still remains ambiguous. Among most current studies, either overall accuracy or subset size (SS) is used to select the most predictive features. Using which one or both and how they affect the prediction performance are still open questions. In this study, we proposed MinE-RFE, a novel RFE-based feature selection approach by sufficiently considering the effect of both factors. Subset decision problem was reflected into subset-accuracy space and became an energy-minimization problem. We also provided a mathematical description of the relationship between the overall accuracy and SS using Gaussian Mixture Models together with spline fitting. Besides, we comprehensively reviewed a variety of state-of-the-art applications in bioinformatics using RFE. We compared their approaches of deciding the final subset from all the candidate subsets with MinE-RFE on diverse bioinformatics data sets. Additionally, we also compared MinE-RFE with some well-used feature selection algorithms. The comparative results demonstrate that the proposed approach exhibits the best performance among all the approaches. To facilitate the use of MinE-RFE, we further established a user-friendly web server with the implementation of the proposed approach, which is accessible at http://qgking.wicp.net/MinE/. We expect this web server will be a useful tool for research community.</description><subject>Accuracy</subject><subject>Algorithms</subject><subject>Bioinformatics</subject><subject>Energy conservation</subject><subject>Feature selection</subject><subject>Internet</subject><subject>Mathematical models</subject><subject>Model accuracy</subject><subject>Optimization</subject><subject>Probabilistic models</subject><subject>Servers</subject><issn>1477-4054</issn><issn>1467-5463</issn><issn>1477-4054</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><recordid>eNp90E9LwzAcxvEgipvTiy9AAiKIUJek6dJ4k7GpoAiye0nSX2dH_8ykPXQn34Pv0FditFPEg6ck8OEhfBE6puSSEhmOda7HWm8IoztoSLkQAScR3_11H6AD51aEMCJiuo8GIYknJBJ0iPRDXs2Cp_nsCqfQgC3zCnDzDLheN3mpCuxa7aDBma1L7BnWHfYmL_NNXi2_ZC8CZUxrleneX99SyPxMiqECu-wO0V6mCgdH23OEFvPZYnob3D_e3E2v7wMT8qgJpAqFMQYybfhEp_5JjQLKI8GElERwpk2seGxCI6kCFkvKNGgxEaHIpA5H6LyfXdv6pQXXJGXuDBSFqqBuXcKoJFwyTmJPT__QVd3ayn8uYZxJ4qsy5tVFr4ytnbOQJWvrk9guoST5DJ_48Ekf3uOT7WSrS0h_6HdpD856ULfr_4Y-AMNSjBU</recordid><startdate>20200323</startdate><enddate>20200323</enddate><creator>Su, Ran</creator><creator>Liu, Xinyi</creator><creator>Wei, Leyi</creator><general>Oxford University Press</general><general>Oxford Publishing Limited (England)</general><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QO</scope><scope>7SC</scope><scope>8FD</scope><scope>FR3</scope><scope>JQ2</scope><scope>K9.</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>P64</scope><scope>RC3</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0003-1444-190X</orcidid></search><sort><creationdate>20200323</creationdate><title>MinE-RFE: determine the optimal subset from RFE by minimizing the subset-accuracy–defined energy</title><author>Su, Ran ; Liu, Xinyi ; Wei, Leyi</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c345t-9a37cccefbc46bd9a31cae145727990742bc8a48c3c91ae28912beb76737f9b3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Accuracy</topic><topic>Algorithms</topic><topic>Bioinformatics</topic><topic>Energy conservation</topic><topic>Feature selection</topic><topic>Internet</topic><topic>Mathematical models</topic><topic>Model accuracy</topic><topic>Optimization</topic><topic>Probabilistic models</topic><topic>Servers</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Su, Ran</creatorcontrib><creatorcontrib>Liu, Xinyi</creatorcontrib><creatorcontrib>Wei, Leyi</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>Biotechnology Research Abstracts</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Health & Medical Complete (Alumni)</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><jtitle>Briefings in bioinformatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Su, Ran</au><au>Liu, Xinyi</au><au>Wei, Leyi</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>MinE-RFE: determine the optimal subset from RFE by minimizing the subset-accuracy–defined energy</atitle><jtitle>Briefings in bioinformatics</jtitle><addtitle>Brief Bioinform</addtitle><date>2020-03-23</date><risdate>2020</risdate><volume>21</volume><issue>2</issue><spage>687</spage><epage>698</epage><pages>687-698</pages><issn>1477-4054</issn><issn>1467-5463</issn><eissn>1477-4054</eissn><abstract>Abstract
Recursive feature elimination (RFE), as one of the most popular feature selection algorithms, has been extensively applied to bioinformatics. During the training, a group of candidate subsets are generated by iteratively eliminating the least important features from the original features. However, how to determine the optimal subset from them still remains ambiguous. Among most current studies, either overall accuracy or subset size (SS) is used to select the most predictive features. Using which one or both and how they affect the prediction performance are still open questions. In this study, we proposed MinE-RFE, a novel RFE-based feature selection approach by sufficiently considering the effect of both factors. Subset decision problem was reflected into subset-accuracy space and became an energy-minimization problem. We also provided a mathematical description of the relationship between the overall accuracy and SS using Gaussian Mixture Models together with spline fitting. Besides, we comprehensively reviewed a variety of state-of-the-art applications in bioinformatics using RFE. We compared their approaches of deciding the final subset from all the candidate subsets with MinE-RFE on diverse bioinformatics data sets. Additionally, we also compared MinE-RFE with some well-used feature selection algorithms. The comparative results demonstrate that the proposed approach exhibits the best performance among all the approaches. To facilitate the use of MinE-RFE, we further established a user-friendly web server with the implementation of the proposed approach, which is accessible at http://qgking.wicp.net/MinE/. We expect this web server will be a useful tool for research community.</abstract><cop>England</cop><pub>Oxford University Press</pub><pmid>30860571</pmid><doi>10.1093/bib/bbz021</doi><tpages>12</tpages><orcidid>https://orcid.org/0000-0003-1444-190X</orcidid></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 1477-4054 |
ispartof | Briefings in bioinformatics, 2020-03, Vol.21 (2), p.687-698 |
issn | 1477-4054 1467-5463 1477-4054 |
language | eng |
recordid | cdi_proquest_miscellaneous_2190492408 |
source | Oxford Journals Open Access Collection |
subjects | Accuracy Algorithms Bioinformatics Energy conservation Feature selection Internet Mathematical models Model accuracy Optimization Probabilistic models Servers |
title | MinE-RFE: determine the optimal subset from RFE by minimizing the subset-accuracy–defined energy |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-10T16%3A06%3A35IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_TOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=MinE-RFE:%20determine%20the%20optimal%20subset%20from%20RFE%20by%20minimizing%20the%20subset-accuracy%E2%80%93defined%20energy&rft.jtitle=Briefings%20in%20bioinformatics&rft.au=Su,%20Ran&rft.date=2020-03-23&rft.volume=21&rft.issue=2&rft.spage=687&rft.epage=698&rft.pages=687-698&rft.issn=1477-4054&rft.eissn=1477-4054&rft_id=info:doi/10.1093/bib/bbz021&rft_dat=%3Cproquest_TOX%3E2429010922%3C/proquest_TOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2429010922&rft_id=info:pmid/30860571&rft_oup_id=10.1093/bib/bbz021&rfr_iscdi=true |