Sequence-Based Prediction of RNA-Binding Proteins Using Random Forest with Minimum Redundancy Maximum Relevance Feature Selection

The prediction of RNA-binding proteins is one of the most challenging problems in computation biology. Although some studies have investigated this problem, the accuracy of prediction is still not sufficient. In this study, a highly accurate method was developed to predict RNA-binding proteins from...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:BioMed research international 2015-01, Vol.2015 (2015), p.1-10
Hauptverfasser: Ma, Xin, Sun, Xiao, Guo, Jing
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 10
container_issue 2015
container_start_page 1
container_title BioMed research international
container_volume 2015
creator Ma, Xin
Sun, Xiao
Guo, Jing
description The prediction of RNA-binding proteins is one of the most challenging problems in computation biology. Although some studies have investigated this problem, the accuracy of prediction is still not sufficient. In this study, a highly accurate method was developed to predict RNA-binding proteins from amino acid sequences using random forests with the minimum redundancy maximum relevance (mRMR) method, followed by incremental feature selection (IFS). We incorporated features of conjoint triad features and three novel features: binding propensity (BP), nonbinding propensity (NBP), and evolutionary information combined with physicochemical properties (EIPP). The results showed that these novel features have important roles in improving the performance of the predictor. Using the mRMR-IFS method, our predictor achieved the best performance (86.62% accuracy and 0.737 Matthews correlation coefficient). High prediction accuracy and successful prediction performance suggested that our method can be a useful approach to identify RNA-binding proteins from sequence information.
doi_str_mv 10.1155/2015/425810
format Article
fullrecord <record><control><sourceid>gale_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_4620426</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A458161419</galeid><sourcerecordid>A458161419</sourcerecordid><originalsourceid>FETCH-LOGICAL-c528t-b3ab61d98980c34ba5eb0a296812937891b72db11a90dbefe2327f92f5ecb5df3</originalsourceid><addsrcrecordid>eNqNkk1v1DAQhiMEolXpiTuyxAWBQv0d-4K0rSggtYC29Gw58WTXVWKXOGnpkX-Owy5L4VRfbM88eseeeYviOcFvCRHiiGIijjgViuBHxT5lhJeScPJ4d2ZsrzhM6QrnpYjEWj4t9qgUnCmJ94ufF_B9gtBAeWwTOPR1AOeb0ceAYouWnxflsQ_Oh1XOxBF8SOgyzdelDS726DQOkEZ068c1OvfB91OPluCm4Gxo7tC5_bENdXCTI4BOwY7TAOgiR37XeVY8aW2X4HC7HxSXp--_nXwsz758-HSyOCsbQdVY1szWkjittMIN47UVUGNLtVSEalYpTeqKupoQq7GroQXKaNVq2gpoauFadlC82-heT3UProEwDrYz14Pv7XBnovXm30zwa7OKN4ZLijmVWeDVVmCIuWdpNL1PDXSdDRCnZEjFqKJCygehpNKk4jqjL_9Dr-I0hNyJTFEpFcei-kutbAfGhzbmJzazqFnwPPx55rPWmw3VDDGlAdrd7wg2s13MbBezsUumX9xvyI79Y44MvN4A62wBe-sfpgYZgdbeg4XgnLBfAAvQiA</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1726684057</pqid></control><display><type>article</type><title>Sequence-Based Prediction of RNA-Binding Proteins Using Random Forest with Minimum Redundancy Maximum Relevance Feature Selection</title><source>MEDLINE</source><source>Wiley Open Access</source><source>PubMed Central</source><source>Alma/SFX Local Collection</source><source>PubMed Central Open Access</source><creator>Ma, Xin ; Sun, Xiao ; Guo, Jing</creator><contributor>McGuffin, Liam</contributor><creatorcontrib>Ma, Xin ; Sun, Xiao ; Guo, Jing ; McGuffin, Liam</creatorcontrib><description>The prediction of RNA-binding proteins is one of the most challenging problems in computation biology. Although some studies have investigated this problem, the accuracy of prediction is still not sufficient. In this study, a highly accurate method was developed to predict RNA-binding proteins from amino acid sequences using random forests with the minimum redundancy maximum relevance (mRMR) method, followed by incremental feature selection (IFS). We incorporated features of conjoint triad features and three novel features: binding propensity (BP), nonbinding propensity (NBP), and evolutionary information combined with physicochemical properties (EIPP). The results showed that these novel features have important roles in improving the performance of the predictor. Using the mRMR-IFS method, our predictor achieved the best performance (86.62% accuracy and 0.737 Matthews correlation coefficient). High prediction accuracy and successful prediction performance suggested that our method can be a useful approach to identify RNA-binding proteins from sequence information.</description><identifier>ISSN: 2314-6133</identifier><identifier>EISSN: 2314-6141</identifier><identifier>DOI: 10.1155/2015/425810</identifier><identifier>PMID: 26543860</identifier><language>eng</language><publisher>Cairo, Egypt: Hindawi Publishing Corporation</publisher><subject>Algorithms ; Amino Acids - chemistry ; Binding proteins ; Biomedical research ; Computational Biology - methods ; Databases, Protein ; Health aspects ; Hydrophobic and Hydrophilic Interactions ; Methods ; Models, Statistical ; Reproducibility of Results ; RNA - chemistry ; RNA sequencing ; RNA-Binding Proteins - chemistry ; Static Electricity</subject><ispartof>BioMed research international, 2015-01, Vol.2015 (2015), p.1-10</ispartof><rights>Copyright © 2015 Xin Ma et al.</rights><rights>COPYRIGHT 2015 John Wiley &amp; Sons, Inc.</rights><rights>Copyright © 2015 Xin Ma et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</rights><rights>Copyright © 2015 Xin Ma et al. 2015</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c528t-b3ab61d98980c34ba5eb0a296812937891b72db11a90dbefe2327f92f5ecb5df3</citedby><cites>FETCH-LOGICAL-c528t-b3ab61d98980c34ba5eb0a296812937891b72db11a90dbefe2327f92f5ecb5df3</cites><orcidid>0000-0002-8101-7271</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC4620426/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC4620426/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,727,780,784,885,27922,27923,53789,53791</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/26543860$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><contributor>McGuffin, Liam</contributor><creatorcontrib>Ma, Xin</creatorcontrib><creatorcontrib>Sun, Xiao</creatorcontrib><creatorcontrib>Guo, Jing</creatorcontrib><title>Sequence-Based Prediction of RNA-Binding Proteins Using Random Forest with Minimum Redundancy Maximum Relevance Feature Selection</title><title>BioMed research international</title><addtitle>Biomed Res Int</addtitle><description>The prediction of RNA-binding proteins is one of the most challenging problems in computation biology. Although some studies have investigated this problem, the accuracy of prediction is still not sufficient. In this study, a highly accurate method was developed to predict RNA-binding proteins from amino acid sequences using random forests with the minimum redundancy maximum relevance (mRMR) method, followed by incremental feature selection (IFS). We incorporated features of conjoint triad features and three novel features: binding propensity (BP), nonbinding propensity (NBP), and evolutionary information combined with physicochemical properties (EIPP). The results showed that these novel features have important roles in improving the performance of the predictor. Using the mRMR-IFS method, our predictor achieved the best performance (86.62% accuracy and 0.737 Matthews correlation coefficient). High prediction accuracy and successful prediction performance suggested that our method can be a useful approach to identify RNA-binding proteins from sequence information.</description><subject>Algorithms</subject><subject>Amino Acids - chemistry</subject><subject>Binding proteins</subject><subject>Biomedical research</subject><subject>Computational Biology - methods</subject><subject>Databases, Protein</subject><subject>Health aspects</subject><subject>Hydrophobic and Hydrophilic Interactions</subject><subject>Methods</subject><subject>Models, Statistical</subject><subject>Reproducibility of Results</subject><subject>RNA - chemistry</subject><subject>RNA sequencing</subject><subject>RNA-Binding Proteins - chemistry</subject><subject>Static Electricity</subject><issn>2314-6133</issn><issn>2314-6141</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2015</creationdate><recordtype>article</recordtype><sourceid>RHX</sourceid><sourceid>EIF</sourceid><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><recordid>eNqNkk1v1DAQhiMEolXpiTuyxAWBQv0d-4K0rSggtYC29Gw58WTXVWKXOGnpkX-Owy5L4VRfbM88eseeeYviOcFvCRHiiGIijjgViuBHxT5lhJeScPJ4d2ZsrzhM6QrnpYjEWj4t9qgUnCmJ94ufF_B9gtBAeWwTOPR1AOeb0ceAYouWnxflsQ_Oh1XOxBF8SOgyzdelDS726DQOkEZ068c1OvfB91OPluCm4Gxo7tC5_bENdXCTI4BOwY7TAOgiR37XeVY8aW2X4HC7HxSXp--_nXwsz758-HSyOCsbQdVY1szWkjittMIN47UVUGNLtVSEalYpTeqKupoQq7GroQXKaNVq2gpoauFadlC82-heT3UProEwDrYz14Pv7XBnovXm30zwa7OKN4ZLijmVWeDVVmCIuWdpNL1PDXSdDRCnZEjFqKJCygehpNKk4jqjL_9Dr-I0hNyJTFEpFcei-kutbAfGhzbmJzazqFnwPPx55rPWmw3VDDGlAdrd7wg2s13MbBezsUumX9xvyI79Y44MvN4A62wBe-sfpgYZgdbeg4XgnLBfAAvQiA</recordid><startdate>20150101</startdate><enddate>20150101</enddate><creator>Ma, Xin</creator><creator>Sun, Xiao</creator><creator>Guo, Jing</creator><general>Hindawi Publishing Corporation</general><general>John Wiley &amp; Sons, Inc</general><general>Hindawi Limited</general><scope>ADJCN</scope><scope>AHFXO</scope><scope>RHU</scope><scope>RHW</scope><scope>RHX</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7QL</scope><scope>7QO</scope><scope>7T7</scope><scope>7TK</scope><scope>7U7</scope><scope>7U9</scope><scope>7X7</scope><scope>7XB</scope><scope>88E</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>BHPHI</scope><scope>C1K</scope><scope>CCPQU</scope><scope>CWDGH</scope><scope>DWQXO</scope><scope>FR3</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>H94</scope><scope>HCIFZ</scope><scope>K9.</scope><scope>LK8</scope><scope>M0S</scope><scope>M1P</scope><scope>M7N</scope><scope>M7P</scope><scope>P5Z</scope><scope>P62</scope><scope>P64</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>7X8</scope><scope>7TM</scope><scope>5PM</scope><orcidid>https://orcid.org/0000-0002-8101-7271</orcidid></search><sort><creationdate>20150101</creationdate><title>Sequence-Based Prediction of RNA-Binding Proteins Using Random Forest with Minimum Redundancy Maximum Relevance Feature Selection</title><author>Ma, Xin ; Sun, Xiao ; Guo, Jing</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c528t-b3ab61d98980c34ba5eb0a296812937891b72db11a90dbefe2327f92f5ecb5df3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2015</creationdate><topic>Algorithms</topic><topic>Amino Acids - chemistry</topic><topic>Binding proteins</topic><topic>Biomedical research</topic><topic>Computational Biology - methods</topic><topic>Databases, Protein</topic><topic>Health aspects</topic><topic>Hydrophobic and Hydrophilic Interactions</topic><topic>Methods</topic><topic>Models, Statistical</topic><topic>Reproducibility of Results</topic><topic>RNA - chemistry</topic><topic>RNA sequencing</topic><topic>RNA-Binding Proteins - chemistry</topic><topic>Static Electricity</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Ma, Xin</creatorcontrib><creatorcontrib>Sun, Xiao</creatorcontrib><creatorcontrib>Guo, Jing</creatorcontrib><collection>الدوريات العلمية والإحصائية - e-Marefa Academic and Statistical Periodicals</collection><collection>معرفة - المحتوى العربي الأكاديمي المتكامل - e-Marefa Academic Complete</collection><collection>Hindawi Publishing Complete</collection><collection>Hindawi Publishing Subscription Journals</collection><collection>Hindawi Publishing Open Access</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Bacteriology Abstracts (Microbiology B)</collection><collection>Biotechnology Research Abstracts</collection><collection>Industrial and Applied Microbiology Abstracts (Microbiology A)</collection><collection>Neurosciences Abstracts</collection><collection>Toxicology Abstracts</collection><collection>Virology and AIDS Abstracts</collection><collection>ProQuest_Health &amp; Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Medical Database (Alumni Edition)</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>Advanced Technologies &amp; Aerospace Database‎ (1962 - current)</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Environmental Sciences and Pollution Management</collection><collection>ProQuest One Community College</collection><collection>Middle East &amp; Africa Database</collection><collection>ProQuest Central</collection><collection>Engineering Research Database</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>AIDS and Cancer Research Abstracts</collection><collection>SciTech Premium Collection (Proquest) (PQ_SDU_P3)</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>Biological Sciences</collection><collection>Health &amp; Medical Collection (Alumni Edition)</collection><collection>PML(ProQuest Medical Library)</collection><collection>Algology Mycology and Protozoology Abstracts (Microbiology C)</collection><collection>Biological Science Database</collection><collection>Advanced Technologies &amp; Aerospace Database</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>MEDLINE - Academic</collection><collection>Nucleic Acids Abstracts</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>BioMed research international</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Ma, Xin</au><au>Sun, Xiao</au><au>Guo, Jing</au><au>McGuffin, Liam</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Sequence-Based Prediction of RNA-Binding Proteins Using Random Forest with Minimum Redundancy Maximum Relevance Feature Selection</atitle><jtitle>BioMed research international</jtitle><addtitle>Biomed Res Int</addtitle><date>2015-01-01</date><risdate>2015</risdate><volume>2015</volume><issue>2015</issue><spage>1</spage><epage>10</epage><pages>1-10</pages><issn>2314-6133</issn><eissn>2314-6141</eissn><abstract>The prediction of RNA-binding proteins is one of the most challenging problems in computation biology. Although some studies have investigated this problem, the accuracy of prediction is still not sufficient. In this study, a highly accurate method was developed to predict RNA-binding proteins from amino acid sequences using random forests with the minimum redundancy maximum relevance (mRMR) method, followed by incremental feature selection (IFS). We incorporated features of conjoint triad features and three novel features: binding propensity (BP), nonbinding propensity (NBP), and evolutionary information combined with physicochemical properties (EIPP). The results showed that these novel features have important roles in improving the performance of the predictor. Using the mRMR-IFS method, our predictor achieved the best performance (86.62% accuracy and 0.737 Matthews correlation coefficient). High prediction accuracy and successful prediction performance suggested that our method can be a useful approach to identify RNA-binding proteins from sequence information.</abstract><cop>Cairo, Egypt</cop><pub>Hindawi Publishing Corporation</pub><pmid>26543860</pmid><doi>10.1155/2015/425810</doi><tpages>10</tpages><orcidid>https://orcid.org/0000-0002-8101-7271</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2314-6133
ispartof BioMed research international, 2015-01, Vol.2015 (2015), p.1-10
issn 2314-6133
2314-6141
language eng
recordid cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_4620426
source MEDLINE; Wiley Open Access; PubMed Central; Alma/SFX Local Collection; PubMed Central Open Access
subjects Algorithms
Amino Acids - chemistry
Binding proteins
Biomedical research
Computational Biology - methods
Databases, Protein
Health aspects
Hydrophobic and Hydrophilic Interactions
Methods
Models, Statistical
Reproducibility of Results
RNA - chemistry
RNA sequencing
RNA-Binding Proteins - chemistry
Static Electricity
title Sequence-Based Prediction of RNA-Binding Proteins Using Random Forest with Minimum Redundancy Maximum Relevance Feature Selection
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-14T15%3A48%3A34IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Sequence-Based%20Prediction%20of%20RNA-Binding%20Proteins%20Using%20Random%20Forest%20with%20Minimum%20Redundancy%20Maximum%20Relevance%20Feature%20Selection&rft.jtitle=BioMed%20research%20international&rft.au=Ma,%20Xin&rft.date=2015-01-01&rft.volume=2015&rft.issue=2015&rft.spage=1&rft.epage=10&rft.pages=1-10&rft.issn=2314-6133&rft.eissn=2314-6141&rft_id=info:doi/10.1155/2015/425810&rft_dat=%3Cgale_pubme%3EA458161419%3C/gale_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1726684057&rft_id=info:pmid/26543860&rft_galeid=A458161419&rfr_iscdi=true