pSuc-Lys: Predict lysine succinylation sites in proteins with PseAAC and ensemble random forest approach

Being one type of post-translational modifications (PTMs), protein lysine succinylation is important in regulating varieties of biological processes. It is also involved with some diseases, however. Consequently, from the angles of both basic research and drug development, we are facing a challengin...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of theoretical biology 2016-04, Vol.394, p.223-230
Hauptverfasser: Jia, Jianhua, Liu, Zi, Xiao, Xuan, Liu, Bingxiang, Chou, Kuo-Chen
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 230
container_issue
container_start_page 223
container_title Journal of theoretical biology
container_volume 394
creator Jia, Jianhua
Liu, Zi
Xiao, Xuan
Liu, Bingxiang
Chou, Kuo-Chen
description Being one type of post-translational modifications (PTMs), protein lysine succinylation is important in regulating varieties of biological processes. It is also involved with some diseases, however. Consequently, from the angles of both basic research and drug development, we are facing a challenging problem: for an uncharacterized protein sequence having many Lys residues therein, which ones can be succinylated, and which ones cannot? To address this problem, we have developed a predictor called pSuc-Lys through (1) incorporating the sequence-coupled information into the general pseudo amino acid composition, (2) balancing out skewed training dataset by random sampling, and (3) constructing an ensemble predictor by fusing a series of individual random forest classifiers. Rigorous cross-validations indicated that it remarkably outperformed the existing methods. A user-friendly web-server for pSuc-Lys has been established at http://www.jci-bioinfo.cn/pSuc-Lys, by which users can easily obtain their desired results without the need to go through the complicated mathematical equations involved. It has not escaped our notice that the formulation and approach presented here can also be used to analyze many other problems in computational proteomics. •Succinylation plays an important role in regulating various biological processes.•A novel ensemble classifier has been developed to predict protein succinylation sites.•It was formed by fusing a series of individual random forest classifiers via a voting system.•A user-friendly web-server has been established.
doi_str_mv 10.1016/j.jtbi.2016.01.020
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1769620112</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0022519316000539</els_id><sourcerecordid>1769620112</sourcerecordid><originalsourceid>FETCH-LOGICAL-c356t-80b38a54152ac073810b5e2270180298761dd3b0324283c33093c6d846b6a92c3</originalsourceid><addsrcrecordid>eNp9kEGP0zAQhS0EYrsLf4AD8pFLwthuHAdxqSpgkSqxEnC2HGequkqc4nFY9d_jqgtHTjMjvfdm5mPsjYBagNDvj_Ux96GWpa9B1CDhGVsJ6JrKNGvxnK0ApKwa0akbdkt0BIBurfRLdiO1gdaAXrHD6fviq92ZPvCHhEPwmY9nChE5Ld6HeB5dDnPkFDISD5Gf0pwxROKPIR_4A-Fms-UuDhwj4dSPyFOZ5onv54SUuTsVh_OHV-zF3o2Er5_qHfv5-dOP7X21-_bl63azq7xqdK4M9Mq4cn4jnYdWGQF9g1K2IAzIzrRaDIPqQcm1NMorBZ3yejBr3WvXSa_u2Ltrbln7aykX2CmQx3F0EeeFrGh1pwsyIYtUXqU-zUQJ9_aUwuTS2QqwF8L2aC-E7YWwBWEL4WJ6-5S_9BMO_yx_kRbBx6sAy5e_AyZLPmD0BW5Cn-0wh__l_wGnOIug</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1769620112</pqid></control><display><type>article</type><title>pSuc-Lys: Predict lysine succinylation sites in proteins with PseAAC and ensemble random forest approach</title><source>MEDLINE</source><source>Elsevier ScienceDirect Journals</source><creator>Jia, Jianhua ; Liu, Zi ; Xiao, Xuan ; Liu, Bingxiang ; Chou, Kuo-Chen</creator><creatorcontrib>Jia, Jianhua ; Liu, Zi ; Xiao, Xuan ; Liu, Bingxiang ; Chou, Kuo-Chen</creatorcontrib><description>Being one type of post-translational modifications (PTMs), protein lysine succinylation is important in regulating varieties of biological processes. It is also involved with some diseases, however. Consequently, from the angles of both basic research and drug development, we are facing a challenging problem: for an uncharacterized protein sequence having many Lys residues therein, which ones can be succinylated, and which ones cannot? To address this problem, we have developed a predictor called pSuc-Lys through (1) incorporating the sequence-coupled information into the general pseudo amino acid composition, (2) balancing out skewed training dataset by random sampling, and (3) constructing an ensemble predictor by fusing a series of individual random forest classifiers. Rigorous cross-validations indicated that it remarkably outperformed the existing methods. A user-friendly web-server for pSuc-Lys has been established at http://www.jci-bioinfo.cn/pSuc-Lys, by which users can easily obtain their desired results without the need to go through the complicated mathematical equations involved. It has not escaped our notice that the formulation and approach presented here can also be used to analyze many other problems in computational proteomics. •Succinylation plays an important role in regulating various biological processes.•A novel ensemble classifier has been developed to predict protein succinylation sites.•It was formed by fusing a series of individual random forest classifiers via a voting system.•A user-friendly web-server has been established.</description><identifier>ISSN: 0022-5193</identifier><identifier>EISSN: 1095-8541</identifier><identifier>DOI: 10.1016/j.jtbi.2016.01.020</identifier><identifier>PMID: 26807806</identifier><language>eng</language><publisher>England: Elsevier Ltd</publisher><subject>Algorithms ; Databases, Protein ; Ensemble random forest ; General PseAAC ; Lysine - metabolism ; Lysine succinylation ; Proteins - metabolism ; pSuc-Lys web-server ; Random downsampling ; Reproducibility of Results ; Sequence-coupling model ; Software ; Succinic Acid - metabolism</subject><ispartof>Journal of theoretical biology, 2016-04, Vol.394, p.223-230</ispartof><rights>2016 Elsevier Ltd</rights><rights>Copyright © 2016 Elsevier Ltd. All rights reserved.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c356t-80b38a54152ac073810b5e2270180298761dd3b0324283c33093c6d846b6a92c3</citedby><cites>FETCH-LOGICAL-c356t-80b38a54152ac073810b5e2270180298761dd3b0324283c33093c6d846b6a92c3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.sciencedirect.com/science/article/pii/S0022519316000539$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,776,780,3537,27901,27902,65306</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/26807806$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Jia, Jianhua</creatorcontrib><creatorcontrib>Liu, Zi</creatorcontrib><creatorcontrib>Xiao, Xuan</creatorcontrib><creatorcontrib>Liu, Bingxiang</creatorcontrib><creatorcontrib>Chou, Kuo-Chen</creatorcontrib><title>pSuc-Lys: Predict lysine succinylation sites in proteins with PseAAC and ensemble random forest approach</title><title>Journal of theoretical biology</title><addtitle>J Theor Biol</addtitle><description>Being one type of post-translational modifications (PTMs), protein lysine succinylation is important in regulating varieties of biological processes. It is also involved with some diseases, however. Consequently, from the angles of both basic research and drug development, we are facing a challenging problem: for an uncharacterized protein sequence having many Lys residues therein, which ones can be succinylated, and which ones cannot? To address this problem, we have developed a predictor called pSuc-Lys through (1) incorporating the sequence-coupled information into the general pseudo amino acid composition, (2) balancing out skewed training dataset by random sampling, and (3) constructing an ensemble predictor by fusing a series of individual random forest classifiers. Rigorous cross-validations indicated that it remarkably outperformed the existing methods. A user-friendly web-server for pSuc-Lys has been established at http://www.jci-bioinfo.cn/pSuc-Lys, by which users can easily obtain their desired results without the need to go through the complicated mathematical equations involved. It has not escaped our notice that the formulation and approach presented here can also be used to analyze many other problems in computational proteomics. •Succinylation plays an important role in regulating various biological processes.•A novel ensemble classifier has been developed to predict protein succinylation sites.•It was formed by fusing a series of individual random forest classifiers via a voting system.•A user-friendly web-server has been established.</description><subject>Algorithms</subject><subject>Databases, Protein</subject><subject>Ensemble random forest</subject><subject>General PseAAC</subject><subject>Lysine - metabolism</subject><subject>Lysine succinylation</subject><subject>Proteins - metabolism</subject><subject>pSuc-Lys web-server</subject><subject>Random downsampling</subject><subject>Reproducibility of Results</subject><subject>Sequence-coupling model</subject><subject>Software</subject><subject>Succinic Acid - metabolism</subject><issn>0022-5193</issn><issn>1095-8541</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2016</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNp9kEGP0zAQhS0EYrsLf4AD8pFLwthuHAdxqSpgkSqxEnC2HGequkqc4nFY9d_jqgtHTjMjvfdm5mPsjYBagNDvj_Ux96GWpa9B1CDhGVsJ6JrKNGvxnK0ApKwa0akbdkt0BIBurfRLdiO1gdaAXrHD6fviq92ZPvCHhEPwmY9nChE5Ld6HeB5dDnPkFDISD5Gf0pwxROKPIR_4A-Fms-UuDhwj4dSPyFOZ5onv54SUuTsVh_OHV-zF3o2Er5_qHfv5-dOP7X21-_bl63azq7xqdK4M9Mq4cn4jnYdWGQF9g1K2IAzIzrRaDIPqQcm1NMorBZ3yejBr3WvXSa_u2Ltrbln7aykX2CmQx3F0EeeFrGh1pwsyIYtUXqU-zUQJ9_aUwuTS2QqwF8L2aC-E7YWwBWEL4WJ6-5S_9BMO_yx_kRbBx6sAy5e_AyZLPmD0BW5Cn-0wh__l_wGnOIug</recordid><startdate>20160407</startdate><enddate>20160407</enddate><creator>Jia, Jianhua</creator><creator>Liu, Zi</creator><creator>Xiao, Xuan</creator><creator>Liu, Bingxiang</creator><creator>Chou, Kuo-Chen</creator><general>Elsevier Ltd</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope></search><sort><creationdate>20160407</creationdate><title>pSuc-Lys: Predict lysine succinylation sites in proteins with PseAAC and ensemble random forest approach</title><author>Jia, Jianhua ; Liu, Zi ; Xiao, Xuan ; Liu, Bingxiang ; Chou, Kuo-Chen</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c356t-80b38a54152ac073810b5e2270180298761dd3b0324283c33093c6d846b6a92c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2016</creationdate><topic>Algorithms</topic><topic>Databases, Protein</topic><topic>Ensemble random forest</topic><topic>General PseAAC</topic><topic>Lysine - metabolism</topic><topic>Lysine succinylation</topic><topic>Proteins - metabolism</topic><topic>pSuc-Lys web-server</topic><topic>Random downsampling</topic><topic>Reproducibility of Results</topic><topic>Sequence-coupling model</topic><topic>Software</topic><topic>Succinic Acid - metabolism</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Jia, Jianhua</creatorcontrib><creatorcontrib>Liu, Zi</creatorcontrib><creatorcontrib>Xiao, Xuan</creatorcontrib><creatorcontrib>Liu, Bingxiang</creatorcontrib><creatorcontrib>Chou, Kuo-Chen</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><jtitle>Journal of theoretical biology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Jia, Jianhua</au><au>Liu, Zi</au><au>Xiao, Xuan</au><au>Liu, Bingxiang</au><au>Chou, Kuo-Chen</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>pSuc-Lys: Predict lysine succinylation sites in proteins with PseAAC and ensemble random forest approach</atitle><jtitle>Journal of theoretical biology</jtitle><addtitle>J Theor Biol</addtitle><date>2016-04-07</date><risdate>2016</risdate><volume>394</volume><spage>223</spage><epage>230</epage><pages>223-230</pages><issn>0022-5193</issn><eissn>1095-8541</eissn><abstract>Being one type of post-translational modifications (PTMs), protein lysine succinylation is important in regulating varieties of biological processes. It is also involved with some diseases, however. Consequently, from the angles of both basic research and drug development, we are facing a challenging problem: for an uncharacterized protein sequence having many Lys residues therein, which ones can be succinylated, and which ones cannot? To address this problem, we have developed a predictor called pSuc-Lys through (1) incorporating the sequence-coupled information into the general pseudo amino acid composition, (2) balancing out skewed training dataset by random sampling, and (3) constructing an ensemble predictor by fusing a series of individual random forest classifiers. Rigorous cross-validations indicated that it remarkably outperformed the existing methods. A user-friendly web-server for pSuc-Lys has been established at http://www.jci-bioinfo.cn/pSuc-Lys, by which users can easily obtain their desired results without the need to go through the complicated mathematical equations involved. It has not escaped our notice that the formulation and approach presented here can also be used to analyze many other problems in computational proteomics. •Succinylation plays an important role in regulating various biological processes.•A novel ensemble classifier has been developed to predict protein succinylation sites.•It was formed by fusing a series of individual random forest classifiers via a voting system.•A user-friendly web-server has been established.</abstract><cop>England</cop><pub>Elsevier Ltd</pub><pmid>26807806</pmid><doi>10.1016/j.jtbi.2016.01.020</doi><tpages>8</tpages></addata></record>
fulltext fulltext
identifier ISSN: 0022-5193
ispartof Journal of theoretical biology, 2016-04, Vol.394, p.223-230
issn 0022-5193
1095-8541
language eng
recordid cdi_proquest_miscellaneous_1769620112
source MEDLINE; Elsevier ScienceDirect Journals
subjects Algorithms
Databases, Protein
Ensemble random forest
General PseAAC
Lysine - metabolism
Lysine succinylation
Proteins - metabolism
pSuc-Lys web-server
Random downsampling
Reproducibility of Results
Sequence-coupling model
Software
Succinic Acid - metabolism
title pSuc-Lys: Predict lysine succinylation sites in proteins with PseAAC and ensemble random forest approach
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-03T00%3A48%3A09IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=pSuc-Lys:%20Predict%20lysine%20succinylation%20sites%20in%20proteins%20with%20PseAAC%20and%20ensemble%20random%20forest%20approach&rft.jtitle=Journal%20of%20theoretical%20biology&rft.au=Jia,%20Jianhua&rft.date=2016-04-07&rft.volume=394&rft.spage=223&rft.epage=230&rft.pages=223-230&rft.issn=0022-5193&rft.eissn=1095-8541&rft_id=info:doi/10.1016/j.jtbi.2016.01.020&rft_dat=%3Cproquest_cross%3E1769620112%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1769620112&rft_id=info:pmid/26807806&rft_els_id=S0022519316000539&rfr_iscdi=true