DeepProSite: structure-aware protein binding site prediction using ESMFold and pretrained language model

Abstract Motivation Identifying the functional sites of a protein, such as the binding sites of proteins, peptides, or other biological components, is crucial for understanding related biological processes and drug design. However, existing sequence-based methods have limited predictive accuracy, as...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Bioinformatics (Oxford, England) England), 2023-12, Vol.39 (12)
Hauptverfasser: Fang, Yitian, Jiang, Yi, Wei, Leyi, Ma, Qin, Ren, Zhixiang, Yuan, Qianmu, Wei, Dong-Qing
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue 12
container_start_page
container_title Bioinformatics (Oxford, England)
container_volume 39
creator Fang, Yitian
Jiang, Yi
Wei, Leyi
Ma, Qin
Ren, Zhixiang
Yuan, Qianmu
Wei, Dong-Qing
description Abstract Motivation Identifying the functional sites of a protein, such as the binding sites of proteins, peptides, or other biological components, is crucial for understanding related biological processes and drug design. However, existing sequence-based methods have limited predictive accuracy, as they only consider sequence-adjacent contextual features and lack structural information. Results In this study, DeepProSite is presented as a new framework for identifying protein binding site that utilizes protein structure and sequence information. DeepProSite first generates protein structures from ESMFold and sequence representations from pretrained language models. It then uses Graph Transformer and formulates binding site predictions as graph node classifications. In predicting protein–protein/peptide binding sites, DeepProSite outperforms state-of-the-art sequence- and structure-based methods on most metrics. Moreover, DeepProSite maintains its performance when predicting unbound structures, in contrast to competing structure-based prediction methods. DeepProSite is also extended to the prediction of binding sites for nucleic acids and other ligands, verifying its generalization capability. Finally, an online server for predicting multiple types of residue is established as the implementation of the proposed DeepProSite. Availability and implementation The datasets and source codes can be accessed at https://github.com/WeiLab-Biology/DeepProSite. The proposed DeepProSite can be accessed at https://inner.wei-group.net/DeepProSite/.
doi_str_mv 10.1093/bioinformatics/btad718
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_2895261823</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><oup_id>10.1093/bioinformatics/btad718</oup_id><sourcerecordid>2895261823</sourcerecordid><originalsourceid>FETCH-LOGICAL-c401t-e9f1d3aad8ac9185799c4c17dd57ed124b22473d679bcfcef4aa9145ca9905193</originalsourceid><addsrcrecordid>eNqNkEFPwzAMhSMEYjD4C1OOXMritmkbbmhsgDQE0uBcpYk7gtpmJKkQ_55OGwhunGw9f362HiETYJfARDKtjDVdbV0rg1F-WgWpcygOyAkkWR6lBcDhr35ETr1_Y4xxxrNjMkoKBrzI4xPyeoO4eXJ2ZQJeUR9cr0LvMJIf0iHdOBvQdLQynTbdmvqBGkTURgVjO9r7rTpfPSxso6ns9HYYnDQdatrIbt3LNdLWamzOyFEtG4_n-zomL4v58-wuWj7e3s-ul5FKGYQIRQ06kVIXUgkoeC6EShXkWvMcNcRpFcdpnugsF5WqFdaplAJSrqQQjINIxuRi5zv8_t6jD2VrvMJm-AZt78u4EDzOoIiTAc12qHLWe4d1uXGmle6zBFZuUy7_plzuUx4WJ_sbfdWi_ln7jnUAYAfYfvNf0y_LOJJU</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2895261823</pqid></control><display><type>article</type><title>DeepProSite: structure-aware protein binding site prediction using ESMFold and pretrained language model</title><source>MEDLINE</source><source>PubMed Central</source><source>Directory of Open Access Journals</source><source>Alma/SFX Local Collection</source><source>EZB Electronic Journals Library</source><source>Oxford Academic Journals (Open Access)</source><creator>Fang, Yitian ; Jiang, Yi ; Wei, Leyi ; Ma, Qin ; Ren, Zhixiang ; Yuan, Qianmu ; Wei, Dong-Qing</creator><contributor>Cowen, Lenore</contributor><creatorcontrib>Fang, Yitian ; Jiang, Yi ; Wei, Leyi ; Ma, Qin ; Ren, Zhixiang ; Yuan, Qianmu ; Wei, Dong-Qing ; Cowen, Lenore</creatorcontrib><description>Abstract Motivation Identifying the functional sites of a protein, such as the binding sites of proteins, peptides, or other biological components, is crucial for understanding related biological processes and drug design. However, existing sequence-based methods have limited predictive accuracy, as they only consider sequence-adjacent contextual features and lack structural information. Results In this study, DeepProSite is presented as a new framework for identifying protein binding site that utilizes protein structure and sequence information. DeepProSite first generates protein structures from ESMFold and sequence representations from pretrained language models. It then uses Graph Transformer and formulates binding site predictions as graph node classifications. In predicting protein–protein/peptide binding sites, DeepProSite outperforms state-of-the-art sequence- and structure-based methods on most metrics. Moreover, DeepProSite maintains its performance when predicting unbound structures, in contrast to competing structure-based prediction methods. DeepProSite is also extended to the prediction of binding sites for nucleic acids and other ligands, verifying its generalization capability. Finally, an online server for predicting multiple types of residue is established as the implementation of the proposed DeepProSite. Availability and implementation The datasets and source codes can be accessed at https://github.com/WeiLab-Biology/DeepProSite. The proposed DeepProSite can be accessed at https://inner.wei-group.net/DeepProSite/.</description><identifier>ISSN: 1367-4811</identifier><identifier>EISSN: 1367-4811</identifier><identifier>DOI: 10.1093/bioinformatics/btad718</identifier><identifier>PMID: 38015872</identifier><language>eng</language><publisher>England: Oxford University Press</publisher><subject>Binding Sites ; Peptides ; Protein Binding ; Proteins - chemistry ; Software</subject><ispartof>Bioinformatics (Oxford, England), 2023-12, Vol.39 (12)</ispartof><rights>The Author(s) 2023. Published by Oxford University Press. 2023</rights><rights>The Author(s) 2023. Published by Oxford University Press.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c401t-e9f1d3aad8ac9185799c4c17dd57ed124b22473d679bcfcef4aa9145ca9905193</citedby><cites>FETCH-LOGICAL-c401t-e9f1d3aad8ac9185799c4c17dd57ed124b22473d679bcfcef4aa9145ca9905193</cites><orcidid>0000-0002-4644-1464 ; 0000-0003-1444-190X ; 0000-0003-4200-7502</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,864,27923,27924</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/38015872$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><contributor>Cowen, Lenore</contributor><creatorcontrib>Fang, Yitian</creatorcontrib><creatorcontrib>Jiang, Yi</creatorcontrib><creatorcontrib>Wei, Leyi</creatorcontrib><creatorcontrib>Ma, Qin</creatorcontrib><creatorcontrib>Ren, Zhixiang</creatorcontrib><creatorcontrib>Yuan, Qianmu</creatorcontrib><creatorcontrib>Wei, Dong-Qing</creatorcontrib><title>DeepProSite: structure-aware protein binding site prediction using ESMFold and pretrained language model</title><title>Bioinformatics (Oxford, England)</title><addtitle>Bioinformatics</addtitle><description>Abstract Motivation Identifying the functional sites of a protein, such as the binding sites of proteins, peptides, or other biological components, is crucial for understanding related biological processes and drug design. However, existing sequence-based methods have limited predictive accuracy, as they only consider sequence-adjacent contextual features and lack structural information. Results In this study, DeepProSite is presented as a new framework for identifying protein binding site that utilizes protein structure and sequence information. DeepProSite first generates protein structures from ESMFold and sequence representations from pretrained language models. It then uses Graph Transformer and formulates binding site predictions as graph node classifications. In predicting protein–protein/peptide binding sites, DeepProSite outperforms state-of-the-art sequence- and structure-based methods on most metrics. Moreover, DeepProSite maintains its performance when predicting unbound structures, in contrast to competing structure-based prediction methods. DeepProSite is also extended to the prediction of binding sites for nucleic acids and other ligands, verifying its generalization capability. Finally, an online server for predicting multiple types of residue is established as the implementation of the proposed DeepProSite. Availability and implementation The datasets and source codes can be accessed at https://github.com/WeiLab-Biology/DeepProSite. The proposed DeepProSite can be accessed at https://inner.wei-group.net/DeepProSite/.</description><subject>Binding Sites</subject><subject>Peptides</subject><subject>Protein Binding</subject><subject>Proteins - chemistry</subject><subject>Software</subject><issn>1367-4811</issn><issn>1367-4811</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>TOX</sourceid><sourceid>EIF</sourceid><recordid>eNqNkEFPwzAMhSMEYjD4C1OOXMritmkbbmhsgDQE0uBcpYk7gtpmJKkQ_55OGwhunGw9f362HiETYJfARDKtjDVdbV0rg1F-WgWpcygOyAkkWR6lBcDhr35ETr1_Y4xxxrNjMkoKBrzI4xPyeoO4eXJ2ZQJeUR9cr0LvMJIf0iHdOBvQdLQynTbdmvqBGkTURgVjO9r7rTpfPSxso6ns9HYYnDQdatrIbt3LNdLWamzOyFEtG4_n-zomL4v58-wuWj7e3s-ul5FKGYQIRQ06kVIXUgkoeC6EShXkWvMcNcRpFcdpnugsF5WqFdaplAJSrqQQjINIxuRi5zv8_t6jD2VrvMJm-AZt78u4EDzOoIiTAc12qHLWe4d1uXGmle6zBFZuUy7_plzuUx4WJ_sbfdWi_ln7jnUAYAfYfvNf0y_LOJJU</recordid><startdate>20231201</startdate><enddate>20231201</enddate><creator>Fang, Yitian</creator><creator>Jiang, Yi</creator><creator>Wei, Leyi</creator><creator>Ma, Qin</creator><creator>Ren, Zhixiang</creator><creator>Yuan, Qianmu</creator><creator>Wei, Dong-Qing</creator><general>Oxford University Press</general><scope>TOX</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0002-4644-1464</orcidid><orcidid>https://orcid.org/0000-0003-1444-190X</orcidid><orcidid>https://orcid.org/0000-0003-4200-7502</orcidid></search><sort><creationdate>20231201</creationdate><title>DeepProSite: structure-aware protein binding site prediction using ESMFold and pretrained language model</title><author>Fang, Yitian ; Jiang, Yi ; Wei, Leyi ; Ma, Qin ; Ren, Zhixiang ; Yuan, Qianmu ; Wei, Dong-Qing</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c401t-e9f1d3aad8ac9185799c4c17dd57ed124b22473d679bcfcef4aa9145ca9905193</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Binding Sites</topic><topic>Peptides</topic><topic>Protein Binding</topic><topic>Proteins - chemistry</topic><topic>Software</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Fang, Yitian</creatorcontrib><creatorcontrib>Jiang, Yi</creatorcontrib><creatorcontrib>Wei, Leyi</creatorcontrib><creatorcontrib>Ma, Qin</creatorcontrib><creatorcontrib>Ren, Zhixiang</creatorcontrib><creatorcontrib>Yuan, Qianmu</creatorcontrib><creatorcontrib>Wei, Dong-Qing</creatorcontrib><collection>Oxford Academic Journals (Open Access)</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><jtitle>Bioinformatics (Oxford, England)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Fang, Yitian</au><au>Jiang, Yi</au><au>Wei, Leyi</au><au>Ma, Qin</au><au>Ren, Zhixiang</au><au>Yuan, Qianmu</au><au>Wei, Dong-Qing</au><au>Cowen, Lenore</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>DeepProSite: structure-aware protein binding site prediction using ESMFold and pretrained language model</atitle><jtitle>Bioinformatics (Oxford, England)</jtitle><addtitle>Bioinformatics</addtitle><date>2023-12-01</date><risdate>2023</risdate><volume>39</volume><issue>12</issue><issn>1367-4811</issn><eissn>1367-4811</eissn><abstract>Abstract Motivation Identifying the functional sites of a protein, such as the binding sites of proteins, peptides, or other biological components, is crucial for understanding related biological processes and drug design. However, existing sequence-based methods have limited predictive accuracy, as they only consider sequence-adjacent contextual features and lack structural information. Results In this study, DeepProSite is presented as a new framework for identifying protein binding site that utilizes protein structure and sequence information. DeepProSite first generates protein structures from ESMFold and sequence representations from pretrained language models. It then uses Graph Transformer and formulates binding site predictions as graph node classifications. In predicting protein–protein/peptide binding sites, DeepProSite outperforms state-of-the-art sequence- and structure-based methods on most metrics. Moreover, DeepProSite maintains its performance when predicting unbound structures, in contrast to competing structure-based prediction methods. DeepProSite is also extended to the prediction of binding sites for nucleic acids and other ligands, verifying its generalization capability. Finally, an online server for predicting multiple types of residue is established as the implementation of the proposed DeepProSite. Availability and implementation The datasets and source codes can be accessed at https://github.com/WeiLab-Biology/DeepProSite. The proposed DeepProSite can be accessed at https://inner.wei-group.net/DeepProSite/.</abstract><cop>England</cop><pub>Oxford University Press</pub><pmid>38015872</pmid><doi>10.1093/bioinformatics/btad718</doi><orcidid>https://orcid.org/0000-0002-4644-1464</orcidid><orcidid>https://orcid.org/0000-0003-1444-190X</orcidid><orcidid>https://orcid.org/0000-0003-4200-7502</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1367-4811
ispartof Bioinformatics (Oxford, England), 2023-12, Vol.39 (12)
issn 1367-4811
1367-4811
language eng
recordid cdi_proquest_miscellaneous_2895261823
source MEDLINE; PubMed Central; Directory of Open Access Journals; Alma/SFX Local Collection; EZB Electronic Journals Library; Oxford Academic Journals (Open Access)
subjects Binding Sites
Peptides
Protein Binding
Proteins - chemistry
Software
title DeepProSite: structure-aware protein binding site prediction using ESMFold and pretrained language model
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-10T19%3A45%3A02IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=DeepProSite:%20structure-aware%20protein%20binding%20site%20prediction%20using%20ESMFold%20and%20pretrained%20language%20model&rft.jtitle=Bioinformatics%20(Oxford,%20England)&rft.au=Fang,%20Yitian&rft.date=2023-12-01&rft.volume=39&rft.issue=12&rft.issn=1367-4811&rft.eissn=1367-4811&rft_id=info:doi/10.1093/bioinformatics/btad718&rft_dat=%3Cproquest_cross%3E2895261823%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2895261823&rft_id=info:pmid/38015872&rft_oup_id=10.1093/bioinformatics/btad718&rfr_iscdi=true