SubMito-XGBoost: predicting protein submitochondrial localization by fusing multiple feature information and eXtreme gradient boosting

Abstract Motivation Mitochondria are an essential organelle in most eukaryotes. They not only play an important role in energy metabolism but also take part in many critical cytopathological processes. Abnormal mitochondria can trigger a series of human diseases, such as Parkinson's disease, mu...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Bioinformatics 2020-02, Vol.36 (4), p.1074-1081
Hauptverfasser: Yu, Bin, Qiu, Wenying, Chen, Cheng, Ma, Anjun, Jiang, Jing, Zhou, Hongyan, Ma, Qin
Format: Artikel
Sprache:eng
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1081
container_issue 4
container_start_page 1074
container_title Bioinformatics
container_volume 36
creator Yu, Bin
Qiu, Wenying
Chen, Cheng
Ma, Anjun
Jiang, Jing
Zhou, Hongyan
Ma, Qin
description Abstract Motivation Mitochondria are an essential organelle in most eukaryotes. They not only play an important role in energy metabolism but also take part in many critical cytopathological processes. Abnormal mitochondria can trigger a series of human diseases, such as Parkinson's disease, multifactor disorder and Type-II diabetes. Protein submitochondrial localization enables the understanding of protein function in studying disease pathogenesis and drug design. Results We proposed a new method, SubMito-XGBoost, for protein submitochondrial localization prediction. Three steps are included: (i) the g-gap dipeptide composition (g-gap DC), pseudo-amino acid composition (PseAAC), auto-correlation function (ACF) and Bi-gram position-specific scoring matrix (Bi-gram PSSM) are employed to extract protein sequence features, (ii) Synthetic Minority Oversampling Technique (SMOTE) is used to balance samples, and the ReliefF algorithm is applied for feature selection and (iii) the obtained feature vectors are fed into XGBoost to predict protein submitochondrial locations. SubMito-XGBoost has obtained satisfactory prediction results by the leave-one-out-cross-validation (LOOCV) compared with existing methods. The prediction accuracies of the SubMito-XGBoost method on the two training datasets M317 and M983 were 97.7% and 98.9%, which are 2.8–12.5% and 3.8–9.9% higher than other methods, respectively. The prediction accuracy of the independent test set M495 was 94.8%, which is significantly better than the existing studies. The proposed method also achieves satisfactory predictive performance on plant and non-plant protein submitochondrial datasets. SubMito-XGBoost also plays an important role in new drug design for the treatment of related diseases. Availability and implementation The source codes and data are publicly available at https://github.com/QUST-AIBBDRC/SubMito-XGBoost/. Supplementary information Supplementary data are available at Bioinformatics online.
doi_str_mv 10.1093/bioinformatics/btz734
format Article
fullrecord <record><control><sourceid>proquest_TOX</sourceid><recordid>TN_cdi_proquest_miscellaneous_2305029724</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><oup_id>10.1093/bioinformatics/btz734</oup_id><sourcerecordid>2305029724</sourcerecordid><originalsourceid>FETCH-LOGICAL-c463t-ecf550b43c98fabbda9728b5dfa2e76ab55f0a9e4c291c69405fcee41f93c6153</originalsourceid><addsrcrecordid>eNqNkbFOwzAURS0EolD4BJBHllA7ttOGDRAUJBADIHWLbOe5GCVxsZ2hfADfjatAJTYmv-Hce_3eReiEknNKSjZR1tnOON_KaHWYqPg5ZXwHHVBekCwnotxNMyumGZ8RNkKHIbwTIijnfB-NGC0I48XsAH099-rRRpct5lfOhXiBVx5qq6Ptlml0EWyHQ6_axOg319XeygY3TsvGfqZo12G1xqYPG77tm2hXDWADMvYe8PaHCZNdjWERPbSAl17WFrqI1SYzSY_QnpFNgOOfd4xeb29eru-yh6f5_fXlQ6Z5wWIG2ghBFGe6nBmpVC3LaT5TojYyh2khlRCGyBK4zkuqi5ITYTQAp6ZkuqCCjdHZ4JtW--ghxKq1QUPTyA5cH6qcEUHyZMoTKgZUexeCB1OtvG2lX1eUVJsKqr8VVEMFSXf6E5GuBvVW9XvzBJABcP3qn57fO2qebQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2305029724</pqid></control><display><type>article</type><title>SubMito-XGBoost: predicting protein submitochondrial localization by fusing multiple feature information and eXtreme gradient boosting</title><source>Oxford Journals Open Access Collection</source><creator>Yu, Bin ; Qiu, Wenying ; Chen, Cheng ; Ma, Anjun ; Jiang, Jing ; Zhou, Hongyan ; Ma, Qin</creator><contributor>Hancock, John</contributor><creatorcontrib>Yu, Bin ; Qiu, Wenying ; Chen, Cheng ; Ma, Anjun ; Jiang, Jing ; Zhou, Hongyan ; Ma, Qin ; Hancock, John</creatorcontrib><description>Abstract Motivation Mitochondria are an essential organelle in most eukaryotes. They not only play an important role in energy metabolism but also take part in many critical cytopathological processes. Abnormal mitochondria can trigger a series of human diseases, such as Parkinson's disease, multifactor disorder and Type-II diabetes. Protein submitochondrial localization enables the understanding of protein function in studying disease pathogenesis and drug design. Results We proposed a new method, SubMito-XGBoost, for protein submitochondrial localization prediction. Three steps are included: (i) the g-gap dipeptide composition (g-gap DC), pseudo-amino acid composition (PseAAC), auto-correlation function (ACF) and Bi-gram position-specific scoring matrix (Bi-gram PSSM) are employed to extract protein sequence features, (ii) Synthetic Minority Oversampling Technique (SMOTE) is used to balance samples, and the ReliefF algorithm is applied for feature selection and (iii) the obtained feature vectors are fed into XGBoost to predict protein submitochondrial locations. SubMito-XGBoost has obtained satisfactory prediction results by the leave-one-out-cross-validation (LOOCV) compared with existing methods. The prediction accuracies of the SubMito-XGBoost method on the two training datasets M317 and M983 were 97.7% and 98.9%, which are 2.8–12.5% and 3.8–9.9% higher than other methods, respectively. The prediction accuracy of the independent test set M495 was 94.8%, which is significantly better than the existing studies. The proposed method also achieves satisfactory predictive performance on plant and non-plant protein submitochondrial datasets. SubMito-XGBoost also plays an important role in new drug design for the treatment of related diseases. Availability and implementation The source codes and data are publicly available at https://github.com/QUST-AIBBDRC/SubMito-XGBoost/. Supplementary information Supplementary data are available at Bioinformatics online.</description><identifier>ISSN: 1367-4803</identifier><identifier>EISSN: 1460-2059</identifier><identifier>EISSN: 1367-4811</identifier><identifier>DOI: 10.1093/bioinformatics/btz734</identifier><identifier>PMID: 31603468</identifier><language>eng</language><publisher>England: Oxford University Press</publisher><ispartof>Bioinformatics, 2020-02, Vol.36 (4), p.1074-1081</ispartof><rights>The Author(s) 2019. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com 2019</rights><rights>The Author(s) 2019. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c463t-ecf550b43c98fabbda9728b5dfa2e76ab55f0a9e4c291c69405fcee41f93c6153</citedby><cites>FETCH-LOGICAL-c463t-ecf550b43c98fabbda9728b5dfa2e76ab55f0a9e4c291c69405fcee41f93c6153</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,1603,27922,27923</link.rule.ids><linktorsrc>$$Uhttps://dx.doi.org/10.1093/bioinformatics/btz734$$EView_record_in_Oxford_University_Press$$FView_record_in_$$GOxford_University_Press</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/31603468$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><contributor>Hancock, John</contributor><creatorcontrib>Yu, Bin</creatorcontrib><creatorcontrib>Qiu, Wenying</creatorcontrib><creatorcontrib>Chen, Cheng</creatorcontrib><creatorcontrib>Ma, Anjun</creatorcontrib><creatorcontrib>Jiang, Jing</creatorcontrib><creatorcontrib>Zhou, Hongyan</creatorcontrib><creatorcontrib>Ma, Qin</creatorcontrib><title>SubMito-XGBoost: predicting protein submitochondrial localization by fusing multiple feature information and eXtreme gradient boosting</title><title>Bioinformatics</title><addtitle>Bioinformatics</addtitle><description>Abstract Motivation Mitochondria are an essential organelle in most eukaryotes. They not only play an important role in energy metabolism but also take part in many critical cytopathological processes. Abnormal mitochondria can trigger a series of human diseases, such as Parkinson's disease, multifactor disorder and Type-II diabetes. Protein submitochondrial localization enables the understanding of protein function in studying disease pathogenesis and drug design. Results We proposed a new method, SubMito-XGBoost, for protein submitochondrial localization prediction. Three steps are included: (i) the g-gap dipeptide composition (g-gap DC), pseudo-amino acid composition (PseAAC), auto-correlation function (ACF) and Bi-gram position-specific scoring matrix (Bi-gram PSSM) are employed to extract protein sequence features, (ii) Synthetic Minority Oversampling Technique (SMOTE) is used to balance samples, and the ReliefF algorithm is applied for feature selection and (iii) the obtained feature vectors are fed into XGBoost to predict protein submitochondrial locations. SubMito-XGBoost has obtained satisfactory prediction results by the leave-one-out-cross-validation (LOOCV) compared with existing methods. The prediction accuracies of the SubMito-XGBoost method on the two training datasets M317 and M983 were 97.7% and 98.9%, which are 2.8–12.5% and 3.8–9.9% higher than other methods, respectively. The prediction accuracy of the independent test set M495 was 94.8%, which is significantly better than the existing studies. The proposed method also achieves satisfactory predictive performance on plant and non-plant protein submitochondrial datasets. SubMito-XGBoost also plays an important role in new drug design for the treatment of related diseases. Availability and implementation The source codes and data are publicly available at https://github.com/QUST-AIBBDRC/SubMito-XGBoost/. Supplementary information Supplementary data are available at Bioinformatics online.</description><issn>1367-4803</issn><issn>1460-2059</issn><issn>1367-4811</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><recordid>eNqNkbFOwzAURS0EolD4BJBHllA7ttOGDRAUJBADIHWLbOe5GCVxsZ2hfADfjatAJTYmv-Hce_3eReiEknNKSjZR1tnOON_KaHWYqPg5ZXwHHVBekCwnotxNMyumGZ8RNkKHIbwTIijnfB-NGC0I48XsAH099-rRRpct5lfOhXiBVx5qq6Ptlml0EWyHQ6_axOg319XeygY3TsvGfqZo12G1xqYPG77tm2hXDWADMvYe8PaHCZNdjWERPbSAl17WFrqI1SYzSY_QnpFNgOOfd4xeb29eru-yh6f5_fXlQ6Z5wWIG2ghBFGe6nBmpVC3LaT5TojYyh2khlRCGyBK4zkuqi5ITYTQAp6ZkuqCCjdHZ4JtW--ghxKq1QUPTyA5cH6qcEUHyZMoTKgZUexeCB1OtvG2lX1eUVJsKqr8VVEMFSXf6E5GuBvVW9XvzBJABcP3qn57fO2qebQ</recordid><startdate>20200215</startdate><enddate>20200215</enddate><creator>Yu, Bin</creator><creator>Qiu, Wenying</creator><creator>Chen, Cheng</creator><creator>Ma, Anjun</creator><creator>Jiang, Jing</creator><creator>Zhou, Hongyan</creator><creator>Ma, Qin</creator><general>Oxford University Press</general><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope></search><sort><creationdate>20200215</creationdate><title>SubMito-XGBoost: predicting protein submitochondrial localization by fusing multiple feature information and eXtreme gradient boosting</title><author>Yu, Bin ; Qiu, Wenying ; Chen, Cheng ; Ma, Anjun ; Jiang, Jing ; Zhou, Hongyan ; Ma, Qin</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c463t-ecf550b43c98fabbda9728b5dfa2e76ab55f0a9e4c291c69405fcee41f93c6153</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Yu, Bin</creatorcontrib><creatorcontrib>Qiu, Wenying</creatorcontrib><creatorcontrib>Chen, Cheng</creatorcontrib><creatorcontrib>Ma, Anjun</creatorcontrib><creatorcontrib>Jiang, Jing</creatorcontrib><creatorcontrib>Zhou, Hongyan</creatorcontrib><creatorcontrib>Ma, Qin</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><jtitle>Bioinformatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Yu, Bin</au><au>Qiu, Wenying</au><au>Chen, Cheng</au><au>Ma, Anjun</au><au>Jiang, Jing</au><au>Zhou, Hongyan</au><au>Ma, Qin</au><au>Hancock, John</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>SubMito-XGBoost: predicting protein submitochondrial localization by fusing multiple feature information and eXtreme gradient boosting</atitle><jtitle>Bioinformatics</jtitle><addtitle>Bioinformatics</addtitle><date>2020-02-15</date><risdate>2020</risdate><volume>36</volume><issue>4</issue><spage>1074</spage><epage>1081</epage><pages>1074-1081</pages><issn>1367-4803</issn><eissn>1460-2059</eissn><eissn>1367-4811</eissn><abstract>Abstract Motivation Mitochondria are an essential organelle in most eukaryotes. They not only play an important role in energy metabolism but also take part in many critical cytopathological processes. Abnormal mitochondria can trigger a series of human diseases, such as Parkinson's disease, multifactor disorder and Type-II diabetes. Protein submitochondrial localization enables the understanding of protein function in studying disease pathogenesis and drug design. Results We proposed a new method, SubMito-XGBoost, for protein submitochondrial localization prediction. Three steps are included: (i) the g-gap dipeptide composition (g-gap DC), pseudo-amino acid composition (PseAAC), auto-correlation function (ACF) and Bi-gram position-specific scoring matrix (Bi-gram PSSM) are employed to extract protein sequence features, (ii) Synthetic Minority Oversampling Technique (SMOTE) is used to balance samples, and the ReliefF algorithm is applied for feature selection and (iii) the obtained feature vectors are fed into XGBoost to predict protein submitochondrial locations. SubMito-XGBoost has obtained satisfactory prediction results by the leave-one-out-cross-validation (LOOCV) compared with existing methods. The prediction accuracies of the SubMito-XGBoost method on the two training datasets M317 and M983 were 97.7% and 98.9%, which are 2.8–12.5% and 3.8–9.9% higher than other methods, respectively. The prediction accuracy of the independent test set M495 was 94.8%, which is significantly better than the existing studies. The proposed method also achieves satisfactory predictive performance on plant and non-plant protein submitochondrial datasets. SubMito-XGBoost also plays an important role in new drug design for the treatment of related diseases. Availability and implementation The source codes and data are publicly available at https://github.com/QUST-AIBBDRC/SubMito-XGBoost/. Supplementary information Supplementary data are available at Bioinformatics online.</abstract><cop>England</cop><pub>Oxford University Press</pub><pmid>31603468</pmid><doi>10.1093/bioinformatics/btz734</doi><tpages>8</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1367-4803
ispartof Bioinformatics, 2020-02, Vol.36 (4), p.1074-1081
issn 1367-4803
1460-2059
1367-4811
language eng
recordid cdi_proquest_miscellaneous_2305029724
source Oxford Journals Open Access Collection
title SubMito-XGBoost: predicting protein submitochondrial localization by fusing multiple feature information and eXtreme gradient boosting
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-13T21%3A56%3A21IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_TOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=SubMito-XGBoost:%20predicting%20protein%20submitochondrial%20localization%20by%20fusing%20multiple%20feature%20information%20and%20eXtreme%20gradient%20boosting&rft.jtitle=Bioinformatics&rft.au=Yu,%20Bin&rft.date=2020-02-15&rft.volume=36&rft.issue=4&rft.spage=1074&rft.epage=1081&rft.pages=1074-1081&rft.issn=1367-4803&rft.eissn=1460-2059&rft_id=info:doi/10.1093/bioinformatics/btz734&rft_dat=%3Cproquest_TOX%3E2305029724%3C/proquest_TOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2305029724&rft_id=info:pmid/31603468&rft_oup_id=10.1093/bioinformatics/btz734&rfr_iscdi=true