Multi-scale encoding of amino acid sequences for predicting protein interactions using gradient boosting decision tree

Nowadays a number of computational approaches have been developed to effectively and accurately predict protein interactions. However, most of these methods typically perform worse when other biological data sources (e.g., protein structure information, protein domains, or gene neighborhoods informa...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:PloS one 2017-08, Vol.12 (8), p.e0181426-e0181426
Hauptverfasser: Zhou, Chang, Yu, Hua, Ding, Yijie, Guo, Fei, Gong, Xiu-Jun
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page e0181426
container_issue 8
container_start_page e0181426
container_title PloS one
container_volume 12
creator Zhou, Chang
Yu, Hua
Ding, Yijie
Guo, Fei
Gong, Xiu-Jun
description Nowadays a number of computational approaches have been developed to effectively and accurately predict protein interactions. However, most of these methods typically perform worse when other biological data sources (e.g., protein structure information, protein domains, or gene neighborhoods information) are not available. In the present work, we propose a method for predicting protein interactions making full use of physicochemical characteristics of amino acids. A protein sequence is encoded at multi-scale by seven properties, including their qualitative and quantitative descriptions, of amino acids. Five kinds of protein descriptors, frequency, composition, transformation, distribution and auto covariance, are extracted from these encodings for representing each protein sequence. The new formed feature representation consisted of 347 dimensions is able to capture not only the compositional and positional information but also their statistical significance of amino acids in the sequence. Based on such a feature representation, the gradient boosting decision tree algorithm is introduced to predict protein interaction class. When the proposed method is tested with the PPI data of S.cerevisiae, it achieves a prediction accuracy of 95.28% at the Matthew's correlation coefficient of 90.68%. Compared with the state-of-the-art works on H.pylori and Human, the accuracies can be raised to 89.27% and 98.00% respectively. Extensive experiments are performed for a crossover protein-protein interactions network and the prediction accuracies are also very promising. Because of learning capabilities of the gradient boosting decision tree and the mutil-scale feature representation scheme, the proposed method might be a useful tool for future proteomics studies.
doi_str_mv 10.1371/journal.pone.0181426
format Article
fullrecord <record><control><sourceid>gale_plos_</sourceid><recordid>TN_cdi_plos_journals_1927151119</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A500114370</galeid><doaj_id>oai_doaj_org_article_5cecadb5482745caa649aa01a631b522</doaj_id><sourcerecordid>A500114370</sourcerecordid><originalsourceid>FETCH-LOGICAL-c692t-209435cc91d91e08007ea66f1392a5d64ff9d0e871a0f19a8a0b896d734094cc3</originalsourceid><addsrcrecordid>eNqNk12L1DAUhoso7jr6D0QLgujFjPlo2uZGWBY_BlYW_LoNZ5LTmQydZkzSRf-96Ux3mcpeSC4SznnOm-RNTpY9p2RBeUXfbV3vO2gXe9fhgtCaFqx8kJ1Tydm8ZIQ_PFmfZU9C2BIieF2Wj7MzVleSCcLPs5svfRvtPGhoMcdOO2O7de6aHHa2czloa_KAv_qUwpA3zud7j8bqOGB77yLaLrddRA8p5rqQ92FIrT0Yi13MV86FA2xQ25CIPHrEp9mjBtqAz8Z5lv34-OH75ef51fWn5eXF1VyXksU5I7LgQmtJjaRIakIqhLJsKJcMhCmLppGGYF1RIA2VUANZ1bI0FS9SpdZ8lr086u5bF9RoWVBUsooKSpNBs2x5JIyDrdp7uwP_Rzmw6hBwfq3AR6tbVEKjBrMSRc2qQmiAspAAhELJ6UowlrTej7v1qx0ane7voZ2ITjOd3ai1u1FCFLKiNAm8GQW8S56HqHY2aGxb6ND1x3MLWVRsOPerf9D7bzdS6_S-ynaNS_vqQVRdCEIoLXhFErW4h0rD4M7q9L8am-KTgreTgsRE_B3X0Ieglt--_j97_XPKvj5hNwht3ATX9oefNQWLI6i9C8Fjc2cyJWpoj1s31NAeamyPVPbi9IHuim77gf8FRdULRQ</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1927151119</pqid></control><display><type>article</type><title>Multi-scale encoding of amino acid sequences for predicting protein interactions using gradient boosting decision tree</title><source>MEDLINE</source><source>DOAJ Directory of Open Access Journals</source><source>Public Library of Science (PLoS) Journals Open Access</source><source>EZB-FREE-00999 freely available EZB journals</source><source>PubMed Central</source><source>Free Full-Text Journals in Chemistry</source><creator>Zhou, Chang ; Yu, Hua ; Ding, Yijie ; Guo, Fei ; Gong, Xiu-Jun</creator><contributor>Liu, Bin</contributor><creatorcontrib>Zhou, Chang ; Yu, Hua ; Ding, Yijie ; Guo, Fei ; Gong, Xiu-Jun ; Liu, Bin</creatorcontrib><description>Nowadays a number of computational approaches have been developed to effectively and accurately predict protein interactions. However, most of these methods typically perform worse when other biological data sources (e.g., protein structure information, protein domains, or gene neighborhoods information) are not available. In the present work, we propose a method for predicting protein interactions making full use of physicochemical characteristics of amino acids. A protein sequence is encoded at multi-scale by seven properties, including their qualitative and quantitative descriptions, of amino acids. Five kinds of protein descriptors, frequency, composition, transformation, distribution and auto covariance, are extracted from these encodings for representing each protein sequence. The new formed feature representation consisted of 347 dimensions is able to capture not only the compositional and positional information but also their statistical significance of amino acids in the sequence. Based on such a feature representation, the gradient boosting decision tree algorithm is introduced to predict protein interaction class. When the proposed method is tested with the PPI data of S.cerevisiae, it achieves a prediction accuracy of 95.28% at the Matthew's correlation coefficient of 90.68%. Compared with the state-of-the-art works on H.pylori and Human, the accuracies can be raised to 89.27% and 98.00% respectively. Extensive experiments are performed for a crossover protein-protein interactions network and the prediction accuracies are also very promising. Because of learning capabilities of the gradient boosting decision tree and the mutil-scale feature representation scheme, the proposed method might be a useful tool for future proteomics studies.</description><identifier>ISSN: 1932-6203</identifier><identifier>EISSN: 1932-6203</identifier><identifier>DOI: 10.1371/journal.pone.0181426</identifier><identifier>PMID: 28792503</identifier><language>eng</language><publisher>United States: Public Library of Science</publisher><subject>Algorithms ; Amino Acid Sequence ; Amino acid sequencing ; Amino acids ; Artificial intelligence ; Bacterial Proteins - genetics ; Bacterial Proteins - metabolism ; Bioinformatics ; Biology and Life Sciences ; Classification ; Computational Biology ; Computer and Information Sciences ; Computer applications ; Computer science ; Correlation coefficient ; Correlation coefficients ; Covariance ; Datasets as Topic ; Decision Trees ; Deoxyribonucleic acid ; DNA ; Engineering and Technology ; Genetic transformation ; Helicobacter pylori ; Humans ; Identification ; Laboratories ; Methods ; Physical Sciences ; Predictions ; Protein composition ; Protein interaction ; Protein Interaction Mapping - methods ; Protein structure ; Protein-protein interactions ; Proteins ; Proteomics ; Research and Analysis Methods ; Saccharomyces cerevisiae ; Saccharomyces cerevisiae Proteins - genetics ; Saccharomyces cerevisiae Proteins - metabolism ; Transformation ; Wnt Proteins - genetics ; Wnt Proteins - metabolism</subject><ispartof>PloS one, 2017-08, Vol.12 (8), p.e0181426-e0181426</ispartof><rights>COPYRIGHT 2017 Public Library of Science</rights><rights>2017 Zhou et al. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>2017 Zhou et al 2017 Zhou et al</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c692t-209435cc91d91e08007ea66f1392a5d64ff9d0e871a0f19a8a0b896d734094cc3</citedby><cites>FETCH-LOGICAL-c692t-209435cc91d91e08007ea66f1392a5d64ff9d0e871a0f19a8a0b896d734094cc3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC5549711/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC5549711/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,723,776,780,860,881,2095,2914,23846,27903,27904,53769,53771,79346,79347</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/28792503$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><contributor>Liu, Bin</contributor><creatorcontrib>Zhou, Chang</creatorcontrib><creatorcontrib>Yu, Hua</creatorcontrib><creatorcontrib>Ding, Yijie</creatorcontrib><creatorcontrib>Guo, Fei</creatorcontrib><creatorcontrib>Gong, Xiu-Jun</creatorcontrib><title>Multi-scale encoding of amino acid sequences for predicting protein interactions using gradient boosting decision tree</title><title>PloS one</title><addtitle>PLoS One</addtitle><description>Nowadays a number of computational approaches have been developed to effectively and accurately predict protein interactions. However, most of these methods typically perform worse when other biological data sources (e.g., protein structure information, protein domains, or gene neighborhoods information) are not available. In the present work, we propose a method for predicting protein interactions making full use of physicochemical characteristics of amino acids. A protein sequence is encoded at multi-scale by seven properties, including their qualitative and quantitative descriptions, of amino acids. Five kinds of protein descriptors, frequency, composition, transformation, distribution and auto covariance, are extracted from these encodings for representing each protein sequence. The new formed feature representation consisted of 347 dimensions is able to capture not only the compositional and positional information but also their statistical significance of amino acids in the sequence. Based on such a feature representation, the gradient boosting decision tree algorithm is introduced to predict protein interaction class. When the proposed method is tested with the PPI data of S.cerevisiae, it achieves a prediction accuracy of 95.28% at the Matthew's correlation coefficient of 90.68%. Compared with the state-of-the-art works on H.pylori and Human, the accuracies can be raised to 89.27% and 98.00% respectively. Extensive experiments are performed for a crossover protein-protein interactions network and the prediction accuracies are also very promising. Because of learning capabilities of the gradient boosting decision tree and the mutil-scale feature representation scheme, the proposed method might be a useful tool for future proteomics studies.</description><subject>Algorithms</subject><subject>Amino Acid Sequence</subject><subject>Amino acid sequencing</subject><subject>Amino acids</subject><subject>Artificial intelligence</subject><subject>Bacterial Proteins - genetics</subject><subject>Bacterial Proteins - metabolism</subject><subject>Bioinformatics</subject><subject>Biology and Life Sciences</subject><subject>Classification</subject><subject>Computational Biology</subject><subject>Computer and Information Sciences</subject><subject>Computer applications</subject><subject>Computer science</subject><subject>Correlation coefficient</subject><subject>Correlation coefficients</subject><subject>Covariance</subject><subject>Datasets as Topic</subject><subject>Decision Trees</subject><subject>Deoxyribonucleic acid</subject><subject>DNA</subject><subject>Engineering and Technology</subject><subject>Genetic transformation</subject><subject>Helicobacter pylori</subject><subject>Humans</subject><subject>Identification</subject><subject>Laboratories</subject><subject>Methods</subject><subject>Physical Sciences</subject><subject>Predictions</subject><subject>Protein composition</subject><subject>Protein interaction</subject><subject>Protein Interaction Mapping - methods</subject><subject>Protein structure</subject><subject>Protein-protein interactions</subject><subject>Proteins</subject><subject>Proteomics</subject><subject>Research and Analysis Methods</subject><subject>Saccharomyces cerevisiae</subject><subject>Saccharomyces cerevisiae Proteins - genetics</subject><subject>Saccharomyces cerevisiae Proteins - metabolism</subject><subject>Transformation</subject><subject>Wnt Proteins - genetics</subject><subject>Wnt Proteins - metabolism</subject><issn>1932-6203</issn><issn>1932-6203</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2017</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><sourceid>DOA</sourceid><recordid>eNqNk12L1DAUhoso7jr6D0QLgujFjPlo2uZGWBY_BlYW_LoNZ5LTmQydZkzSRf-96Ux3mcpeSC4SznnOm-RNTpY9p2RBeUXfbV3vO2gXe9fhgtCaFqx8kJ1Tydm8ZIQ_PFmfZU9C2BIieF2Wj7MzVleSCcLPs5svfRvtPGhoMcdOO2O7de6aHHa2czloa_KAv_qUwpA3zud7j8bqOGB77yLaLrddRA8p5rqQ92FIrT0Yi13MV86FA2xQ25CIPHrEp9mjBtqAz8Z5lv34-OH75ef51fWn5eXF1VyXksU5I7LgQmtJjaRIakIqhLJsKJcMhCmLppGGYF1RIA2VUANZ1bI0FS9SpdZ8lr086u5bF9RoWVBUsooKSpNBs2x5JIyDrdp7uwP_Rzmw6hBwfq3AR6tbVEKjBrMSRc2qQmiAspAAhELJ6UowlrTej7v1qx0ane7voZ2ITjOd3ai1u1FCFLKiNAm8GQW8S56HqHY2aGxb6ND1x3MLWVRsOPerf9D7bzdS6_S-ynaNS_vqQVRdCEIoLXhFErW4h0rD4M7q9L8am-KTgreTgsRE_B3X0Ieglt--_j97_XPKvj5hNwht3ATX9oefNQWLI6i9C8Fjc2cyJWpoj1s31NAeamyPVPbi9IHuim77gf8FRdULRQ</recordid><startdate>20170808</startdate><enddate>20170808</enddate><creator>Zhou, Chang</creator><creator>Yu, Hua</creator><creator>Ding, Yijie</creator><creator>Guo, Fei</creator><creator>Gong, Xiu-Jun</creator><general>Public Library of Science</general><general>Public Library of Science (PLoS)</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>IOV</scope><scope>ISR</scope><scope>3V.</scope><scope>7QG</scope><scope>7QL</scope><scope>7QO</scope><scope>7RV</scope><scope>7SN</scope><scope>7SS</scope><scope>7T5</scope><scope>7TG</scope><scope>7TM</scope><scope>7U9</scope><scope>7X2</scope><scope>7X7</scope><scope>7XB</scope><scope>88E</scope><scope>8AO</scope><scope>8C1</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AEUYN</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>ATCPS</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>BHPHI</scope><scope>C1K</scope><scope>CCPQU</scope><scope>D1I</scope><scope>DWQXO</scope><scope>FR3</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>H94</scope><scope>HCIFZ</scope><scope>K9.</scope><scope>KB.</scope><scope>KB0</scope><scope>KL.</scope><scope>L6V</scope><scope>LK8</scope><scope>M0K</scope><scope>M0S</scope><scope>M1P</scope><scope>M7N</scope><scope>M7P</scope><scope>M7S</scope><scope>NAPCQ</scope><scope>P5Z</scope><scope>P62</scope><scope>P64</scope><scope>PATMY</scope><scope>PDBOC</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PTHSS</scope><scope>PYCSY</scope><scope>RC3</scope><scope>7X8</scope><scope>5PM</scope><scope>DOA</scope></search><sort><creationdate>20170808</creationdate><title>Multi-scale encoding of amino acid sequences for predicting protein interactions using gradient boosting decision tree</title><author>Zhou, Chang ; Yu, Hua ; Ding, Yijie ; Guo, Fei ; Gong, Xiu-Jun</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c692t-209435cc91d91e08007ea66f1392a5d64ff9d0e871a0f19a8a0b896d734094cc3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2017</creationdate><topic>Algorithms</topic><topic>Amino Acid Sequence</topic><topic>Amino acid sequencing</topic><topic>Amino acids</topic><topic>Artificial intelligence</topic><topic>Bacterial Proteins - genetics</topic><topic>Bacterial Proteins - metabolism</topic><topic>Bioinformatics</topic><topic>Biology and Life Sciences</topic><topic>Classification</topic><topic>Computational Biology</topic><topic>Computer and Information Sciences</topic><topic>Computer applications</topic><topic>Computer science</topic><topic>Correlation coefficient</topic><topic>Correlation coefficients</topic><topic>Covariance</topic><topic>Datasets as Topic</topic><topic>Decision Trees</topic><topic>Deoxyribonucleic acid</topic><topic>DNA</topic><topic>Engineering and Technology</topic><topic>Genetic transformation</topic><topic>Helicobacter pylori</topic><topic>Humans</topic><topic>Identification</topic><topic>Laboratories</topic><topic>Methods</topic><topic>Physical Sciences</topic><topic>Predictions</topic><topic>Protein composition</topic><topic>Protein interaction</topic><topic>Protein Interaction Mapping - methods</topic><topic>Protein structure</topic><topic>Protein-protein interactions</topic><topic>Proteins</topic><topic>Proteomics</topic><topic>Research and Analysis Methods</topic><topic>Saccharomyces cerevisiae</topic><topic>Saccharomyces cerevisiae Proteins - genetics</topic><topic>Saccharomyces cerevisiae Proteins - metabolism</topic><topic>Transformation</topic><topic>Wnt Proteins - genetics</topic><topic>Wnt Proteins - metabolism</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zhou, Chang</creatorcontrib><creatorcontrib>Yu, Hua</creatorcontrib><creatorcontrib>Ding, Yijie</creatorcontrib><creatorcontrib>Guo, Fei</creatorcontrib><creatorcontrib>Gong, Xiu-Jun</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Gale In Context: Opposing Viewpoints</collection><collection>Gale In Context: Science</collection><collection>ProQuest Central (Corporate)</collection><collection>Animal Behavior Abstracts</collection><collection>Bacteriology Abstracts (Microbiology B)</collection><collection>Biotechnology Research Abstracts</collection><collection>Nursing &amp; Allied Health Database</collection><collection>Ecology Abstracts</collection><collection>Entomology Abstracts (Full archive)</collection><collection>Immunology Abstracts</collection><collection>Meteorological &amp; Geoastrophysical Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Virology and AIDS Abstracts</collection><collection>Agricultural Science Collection</collection><collection>Health &amp; Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Medical Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Public Health Database</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest One Sustainability</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>Agricultural &amp; Environmental Science Collection</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>Technology Collection (ProQuest)</collection><collection>Natural Science Collection (ProQuest)</collection><collection>Environmental Sciences and Pollution Management</collection><collection>ProQuest One Community College</collection><collection>ProQuest Materials Science Collection</collection><collection>ProQuest Central Korea</collection><collection>Engineering Research Database</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>AIDS and Cancer Research Abstracts</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>Materials Science Database</collection><collection>Nursing &amp; Allied Health Database (Alumni Edition)</collection><collection>Meteorological &amp; Geoastrophysical Abstracts - Academic</collection><collection>ProQuest Engineering Collection</collection><collection>ProQuest Biological Science Collection</collection><collection>Agricultural Science Database</collection><collection>Health &amp; Medical Collection (Alumni Edition)</collection><collection>Medical Database</collection><collection>Algology Mycology and Protozoology Abstracts (Microbiology C)</collection><collection>Biological Science Database</collection><collection>Engineering Database</collection><collection>Nursing &amp; Allied Health Premium</collection><collection>Advanced Technologies &amp; Aerospace Database</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Environmental Science Database</collection><collection>Materials Science Collection</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>Engineering Collection</collection><collection>Environmental Science Collection</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>PloS one</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Zhou, Chang</au><au>Yu, Hua</au><au>Ding, Yijie</au><au>Guo, Fei</au><au>Gong, Xiu-Jun</au><au>Liu, Bin</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Multi-scale encoding of amino acid sequences for predicting protein interactions using gradient boosting decision tree</atitle><jtitle>PloS one</jtitle><addtitle>PLoS One</addtitle><date>2017-08-08</date><risdate>2017</risdate><volume>12</volume><issue>8</issue><spage>e0181426</spage><epage>e0181426</epage><pages>e0181426-e0181426</pages><issn>1932-6203</issn><eissn>1932-6203</eissn><abstract>Nowadays a number of computational approaches have been developed to effectively and accurately predict protein interactions. However, most of these methods typically perform worse when other biological data sources (e.g., protein structure information, protein domains, or gene neighborhoods information) are not available. In the present work, we propose a method for predicting protein interactions making full use of physicochemical characteristics of amino acids. A protein sequence is encoded at multi-scale by seven properties, including their qualitative and quantitative descriptions, of amino acids. Five kinds of protein descriptors, frequency, composition, transformation, distribution and auto covariance, are extracted from these encodings for representing each protein sequence. The new formed feature representation consisted of 347 dimensions is able to capture not only the compositional and positional information but also their statistical significance of amino acids in the sequence. Based on such a feature representation, the gradient boosting decision tree algorithm is introduced to predict protein interaction class. When the proposed method is tested with the PPI data of S.cerevisiae, it achieves a prediction accuracy of 95.28% at the Matthew's correlation coefficient of 90.68%. Compared with the state-of-the-art works on H.pylori and Human, the accuracies can be raised to 89.27% and 98.00% respectively. Extensive experiments are performed for a crossover protein-protein interactions network and the prediction accuracies are also very promising. Because of learning capabilities of the gradient boosting decision tree and the mutil-scale feature representation scheme, the proposed method might be a useful tool for future proteomics studies.</abstract><cop>United States</cop><pub>Public Library of Science</pub><pmid>28792503</pmid><doi>10.1371/journal.pone.0181426</doi><tpages>e0181426</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1932-6203
ispartof PloS one, 2017-08, Vol.12 (8), p.e0181426-e0181426
issn 1932-6203
1932-6203
language eng
recordid cdi_plos_journals_1927151119
source MEDLINE; DOAJ Directory of Open Access Journals; Public Library of Science (PLoS) Journals Open Access; EZB-FREE-00999 freely available EZB journals; PubMed Central; Free Full-Text Journals in Chemistry
subjects Algorithms
Amino Acid Sequence
Amino acid sequencing
Amino acids
Artificial intelligence
Bacterial Proteins - genetics
Bacterial Proteins - metabolism
Bioinformatics
Biology and Life Sciences
Classification
Computational Biology
Computer and Information Sciences
Computer applications
Computer science
Correlation coefficient
Correlation coefficients
Covariance
Datasets as Topic
Decision Trees
Deoxyribonucleic acid
DNA
Engineering and Technology
Genetic transformation
Helicobacter pylori
Humans
Identification
Laboratories
Methods
Physical Sciences
Predictions
Protein composition
Protein interaction
Protein Interaction Mapping - methods
Protein structure
Protein-protein interactions
Proteins
Proteomics
Research and Analysis Methods
Saccharomyces cerevisiae
Saccharomyces cerevisiae Proteins - genetics
Saccharomyces cerevisiae Proteins - metabolism
Transformation
Wnt Proteins - genetics
Wnt Proteins - metabolism
title Multi-scale encoding of amino acid sequences for predicting protein interactions using gradient boosting decision tree
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-24T03%3A51%3A43IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_plos_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Multi-scale%20encoding%20of%20amino%20acid%20sequences%20for%20predicting%20protein%20interactions%20using%20gradient%20boosting%20decision%20tree&rft.jtitle=PloS%20one&rft.au=Zhou,%20Chang&rft.date=2017-08-08&rft.volume=12&rft.issue=8&rft.spage=e0181426&rft.epage=e0181426&rft.pages=e0181426-e0181426&rft.issn=1932-6203&rft.eissn=1932-6203&rft_id=info:doi/10.1371/journal.pone.0181426&rft_dat=%3Cgale_plos_%3EA500114370%3C/gale_plos_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1927151119&rft_id=info:pmid/28792503&rft_galeid=A500114370&rft_doaj_id=oai_doaj_org_article_5cecadb5482745caa649aa01a631b522&rfr_iscdi=true