Discrimination of soluble and aggregation-prone proteins based on sequence information

Understanding the factors governing protein solubility is a key to grasp the mechanisms of protein solubility and may provide insight into protein aggregation and misfolding related diseases such as Alzheimer's disease. In this work, we attempt to identify factors important to protein solubilit...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Molecular bioSystems 2013-01, Vol.9 (4), p.806-811
Hauptverfasser: Fang, Yaping, Fang, Jianwen
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 811
container_issue 4
container_start_page 806
container_title Molecular bioSystems
container_volume 9
creator Fang, Yaping
Fang, Jianwen
description Understanding the factors governing protein solubility is a key to grasp the mechanisms of protein solubility and may provide insight into protein aggregation and misfolding related diseases such as Alzheimer's disease. In this work, we attempt to identify factors important to protein solubility using feature selection. Firstly, we calculate 1438 features including physicochemical properties and statistics for each protein. Random Forest algorithm is used to select the most informative and the minimal subset of features based on their predictive performance. A predictive model is built based on 17 selected features. Compared with previous models, our model achieves better performance with a sensitivity of 0.82, specificity 0.85, ACC 0.84, AUC 0.91 and MCC 0.67. Furthermore, a model using a redundancy-reduced dataset (sequence identity
doi_str_mv 10.1039/c3mb70033j
format Article
fullrecord <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_3627541</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1315134298</sourcerecordid><originalsourceid>FETCH-LOGICAL-c444t-804ff96c7d738ff942457f4d3942b993bf2c7515f73485ee5771197af7c107303</originalsourceid><addsrcrecordid>eNpVkU1LAzEQhoMotlYv_gDJUYTVZJM0uxdB6icUvKh4C9lssqbsJjXZFfz3xrZWvUxemGfeycwAcIzROUakvFCkqzhChCx2wBhzmmc5Ynh3q6evI3AQ4yIhBcVoH4xyQilCBR6Dl2sbVbCddbK33kFvYPTtULUaSldD2TRBN6tUtgzeaZhir62LsJJR1zCVRP0-aKc0tM740K3gQ7BnZBv10eadgOfbm6fZfTZ_vHuYXc0zRSntswJRY8qp4jUnRVI0p4wbWpOkqrIklckVZ5gZTmjBtGacY1xyabjCiBNEJuBy7bscqk7XSrs-yFYs00QyfAovrfifcfZNNP5DkGnOGcXJ4HRjEHwaI_aiSwvRbSud9kMUmGCGCc3LIqFna1QFH2PQZtsGI_F9CPF7iASf_P3YFv3ZPPkC5NyFZA</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1315134298</pqid></control><display><type>article</type><title>Discrimination of soluble and aggregation-prone proteins based on sequence information</title><source>MEDLINE</source><source>Royal Society Of Chemistry Journals 2008-</source><source>Alma/SFX Local Collection</source><creator>Fang, Yaping ; Fang, Jianwen</creator><creatorcontrib>Fang, Yaping ; Fang, Jianwen</creatorcontrib><description>Understanding the factors governing protein solubility is a key to grasp the mechanisms of protein solubility and may provide insight into protein aggregation and misfolding related diseases such as Alzheimer's disease. In this work, we attempt to identify factors important to protein solubility using feature selection. Firstly, we calculate 1438 features including physicochemical properties and statistics for each protein. Random Forest algorithm is used to select the most informative and the minimal subset of features based on their predictive performance. A predictive model is built based on 17 selected features. Compared with previous models, our model achieves better performance with a sensitivity of 0.82, specificity 0.85, ACC 0.84, AUC 0.91 and MCC 0.67. Furthermore, a model using a redundancy-reduced dataset (sequence identity &lt;= 30%) achieves the same performance as the model without redundancy reduction. Our results provide not only a reliable model for predicting protein solubility but also a list of features important to protein solubility. The predictive model is implemented as a freely available web application at .</description><identifier>ISSN: 1742-206X</identifier><identifier>EISSN: 1742-2051</identifier><identifier>DOI: 10.1039/c3mb70033j</identifier><identifier>PMID: 23440081</identifier><language>eng</language><publisher>England</publisher><subject>Algorithms ; Amino Acids - chemistry ; Databases, Protein ; Humans ; Internet ; Models, Theoretical ; Proteins - chemistry ; Sensitivity and Specificity ; Software ; Solubility</subject><ispartof>Molecular bioSystems, 2013-01, Vol.9 (4), p.806-811</ispartof><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c444t-804ff96c7d738ff942457f4d3942b993bf2c7515f73485ee5771197af7c107303</citedby><cites>FETCH-LOGICAL-c444t-804ff96c7d738ff942457f4d3942b993bf2c7515f73485ee5771197af7c107303</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>230,314,780,784,885,27924,27925</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/23440081$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Fang, Yaping</creatorcontrib><creatorcontrib>Fang, Jianwen</creatorcontrib><title>Discrimination of soluble and aggregation-prone proteins based on sequence information</title><title>Molecular bioSystems</title><addtitle>Mol Biosyst</addtitle><description>Understanding the factors governing protein solubility is a key to grasp the mechanisms of protein solubility and may provide insight into protein aggregation and misfolding related diseases such as Alzheimer's disease. In this work, we attempt to identify factors important to protein solubility using feature selection. Firstly, we calculate 1438 features including physicochemical properties and statistics for each protein. Random Forest algorithm is used to select the most informative and the minimal subset of features based on their predictive performance. A predictive model is built based on 17 selected features. Compared with previous models, our model achieves better performance with a sensitivity of 0.82, specificity 0.85, ACC 0.84, AUC 0.91 and MCC 0.67. Furthermore, a model using a redundancy-reduced dataset (sequence identity &lt;= 30%) achieves the same performance as the model without redundancy reduction. Our results provide not only a reliable model for predicting protein solubility but also a list of features important to protein solubility. The predictive model is implemented as a freely available web application at .</description><subject>Algorithms</subject><subject>Amino Acids - chemistry</subject><subject>Databases, Protein</subject><subject>Humans</subject><subject>Internet</subject><subject>Models, Theoretical</subject><subject>Proteins - chemistry</subject><subject>Sensitivity and Specificity</subject><subject>Software</subject><subject>Solubility</subject><issn>1742-206X</issn><issn>1742-2051</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2013</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNpVkU1LAzEQhoMotlYv_gDJUYTVZJM0uxdB6icUvKh4C9lssqbsJjXZFfz3xrZWvUxemGfeycwAcIzROUakvFCkqzhChCx2wBhzmmc5Ynh3q6evI3AQ4yIhBcVoH4xyQilCBR6Dl2sbVbCddbK33kFvYPTtULUaSldD2TRBN6tUtgzeaZhir62LsJJR1zCVRP0-aKc0tM740K3gQ7BnZBv10eadgOfbm6fZfTZ_vHuYXc0zRSntswJRY8qp4jUnRVI0p4wbWpOkqrIklckVZ5gZTmjBtGacY1xyabjCiBNEJuBy7bscqk7XSrs-yFYs00QyfAovrfifcfZNNP5DkGnOGcXJ4HRjEHwaI_aiSwvRbSud9kMUmGCGCc3LIqFna1QFH2PQZtsGI_F9CPF7iASf_P3YFv3ZPPkC5NyFZA</recordid><startdate>20130101</startdate><enddate>20130101</enddate><creator>Fang, Yaping</creator><creator>Fang, Jianwen</creator><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>5PM</scope></search><sort><creationdate>20130101</creationdate><title>Discrimination of soluble and aggregation-prone proteins based on sequence information</title><author>Fang, Yaping ; Fang, Jianwen</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c444t-804ff96c7d738ff942457f4d3942b993bf2c7515f73485ee5771197af7c107303</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2013</creationdate><topic>Algorithms</topic><topic>Amino Acids - chemistry</topic><topic>Databases, Protein</topic><topic>Humans</topic><topic>Internet</topic><topic>Models, Theoretical</topic><topic>Proteins - chemistry</topic><topic>Sensitivity and Specificity</topic><topic>Software</topic><topic>Solubility</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Fang, Yaping</creatorcontrib><creatorcontrib>Fang, Jianwen</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Molecular bioSystems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Fang, Yaping</au><au>Fang, Jianwen</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Discrimination of soluble and aggregation-prone proteins based on sequence information</atitle><jtitle>Molecular bioSystems</jtitle><addtitle>Mol Biosyst</addtitle><date>2013-01-01</date><risdate>2013</risdate><volume>9</volume><issue>4</issue><spage>806</spage><epage>811</epage><pages>806-811</pages><issn>1742-206X</issn><eissn>1742-2051</eissn><abstract>Understanding the factors governing protein solubility is a key to grasp the mechanisms of protein solubility and may provide insight into protein aggregation and misfolding related diseases such as Alzheimer's disease. In this work, we attempt to identify factors important to protein solubility using feature selection. Firstly, we calculate 1438 features including physicochemical properties and statistics for each protein. Random Forest algorithm is used to select the most informative and the minimal subset of features based on their predictive performance. A predictive model is built based on 17 selected features. Compared with previous models, our model achieves better performance with a sensitivity of 0.82, specificity 0.85, ACC 0.84, AUC 0.91 and MCC 0.67. Furthermore, a model using a redundancy-reduced dataset (sequence identity &lt;= 30%) achieves the same performance as the model without redundancy reduction. Our results provide not only a reliable model for predicting protein solubility but also a list of features important to protein solubility. The predictive model is implemented as a freely available web application at .</abstract><cop>England</cop><pmid>23440081</pmid><doi>10.1039/c3mb70033j</doi><tpages>6</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1742-206X
ispartof Molecular bioSystems, 2013-01, Vol.9 (4), p.806-811
issn 1742-206X
1742-2051
language eng
recordid cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_3627541
source MEDLINE; Royal Society Of Chemistry Journals 2008-; Alma/SFX Local Collection
subjects Algorithms
Amino Acids - chemistry
Databases, Protein
Humans
Internet
Models, Theoretical
Proteins - chemistry
Sensitivity and Specificity
Software
Solubility
title Discrimination of soluble and aggregation-prone proteins based on sequence information
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-06T01%3A53%3A30IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Discrimination%20of%20soluble%20and%20aggregation-prone%20proteins%20based%20on%20sequence%20information&rft.jtitle=Molecular%20bioSystems&rft.au=Fang,%20Yaping&rft.date=2013-01-01&rft.volume=9&rft.issue=4&rft.spage=806&rft.epage=811&rft.pages=806-811&rft.issn=1742-206X&rft.eissn=1742-2051&rft_id=info:doi/10.1039/c3mb70033j&rft_dat=%3Cproquest_pubme%3E1315134298%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1315134298&rft_id=info:pmid/23440081&rfr_iscdi=true