De novo protein fold families expand the designable ligand binding site space
A major challenge in designing proteins de novo to bind user-defined ligands with high affinity is finding backbones structures into which a new binding site geometry can be engineered with high precision. Recent advances in methods to generate protein fold families de novo have expanded the space o...
Gespeichert in:
Veröffentlicht in: | PLoS computational biology 2021-11, Vol.17 (11), p.e1009620-e1009620 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | e1009620 |
---|---|
container_issue | 11 |
container_start_page | e1009620 |
container_title | PLoS computational biology |
container_volume | 17 |
creator | Pan, Xingjie Kortemme, Tanja |
description | A major challenge in designing proteins de novo to bind user-defined ligands with high affinity is finding backbones structures into which a new binding site geometry can be engineered with high precision. Recent advances in methods to generate protein fold families de novo have expanded the space of accessible protein structures, but it is not clear to what extend de novo proteins with diverse geometries also expand the space of designable ligand binding functions. We constructed a library of 25,806 high-quality ligand binding sites and developed a fast protocol to place ("match") these binding sites into both naturally occurring and de novo protein families with two fold topologies: Rossman and NTF2. Each matching step involves engineering new binding site residues into each protein "scaffold", which is distinct from the problem of comparing already existing binding pockets. 5,896 and 7,475 binding sites could be matched to the Rossmann and NTF2 fold families, respectively. De novo designed Rossman and NTF2 protein families can support 1,791 and 678 binding sites that cannot be matched to naturally existing structures with the same topologies, respectively. While the number of protein residues in ligand binding sites is the major determinant of matching success, ligand size and primary sequence separation of binding site residues also play important roles. The number of matched binding sites are power law functions of the number of members in a fold family. Our results suggest that de novo sampling of geometric variations on diverse fold topologies can significantly expand the space of designable ligand binding sites for a wealth of possible new protein functions. |
doi_str_mv | 10.1371/journal.pcbi.1009620 |
format | Article |
fullrecord | <record><control><sourceid>gale_plos_</sourceid><recordid>TN_cdi_plos_journals_2610945859</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A684564216</galeid><doaj_id>oai_doaj_org_article_5a271d29e74c47d5abad432c3b3e05c7</doaj_id><sourcerecordid>A684564216</sourcerecordid><originalsourceid>FETCH-LOGICAL-c5760-47f61a02cee5b956ad212e80a61b1a76c01f838c59494ddb007b5c267c86b5e93</originalsourceid><addsrcrecordid>eNqVkktv1DAQxyMEoqXwDRBE4gKHXWzHj-SCVJXXSgUkHmfLj0nqlddO42xVvj0Om1YN6gX54NH4N__xjP5F8RyjNa4EfruN-yEov-6NdmuMUMMJelAcY8aqlahY_fBOfFQ8SWmLUA4b_rg4qmiNRIOa4-LLeyhDvIplP8QRXCjb6G3Zqp3zDlIJ170KthwvoLSQXBeU9lB6101Z7YJ1oSuTG6FMvTLwtHjUKp_g2XyfFL8-fvh59nl1_u3T5uz0fGWY4GhFRcuxQsQAMN0wrizBBGqkONZYCW4QbuuqNqyhDbVWIyQ0M4QLU3PNoKlOipcH3d7HJOdNJEk4Rg1lNZuIzYGwUW1lP7idGn7LqJz8m4hDJ9UwOuNBMkUEtqQBQQ0VlimtLK2IqXQFiBmRtd7N3fZ6B9ZAGAflF6LLl-AuZBevZM1pjQnNAq9ngSFe7iGNcueSAe9VgLif_o1wJhlHGX31D3r_dDPVqTyAC23Mfc0kKk95TRmnBPNMre-h8rGwcyYGaF3OLwreLAoyM8L12Kl9SnLz4_t_sF-XLD2wZogpDdDe7g4jOZn5Zkg5mVnOZs5lL-7u_bboxr3VHygW7l8</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2610945859</pqid></control><display><type>article</type><title>De novo protein fold families expand the designable ligand binding site space</title><source>MEDLINE</source><source>DOAJ Directory of Open Access Journals</source><source>Public Library of Science (PLoS)</source><source>EZB-FREE-00999 freely available EZB journals</source><source>PubMed Central</source><creator>Pan, Xingjie ; Kortemme, Tanja</creator><contributor>Keskin, Ozlem</contributor><creatorcontrib>Pan, Xingjie ; Kortemme, Tanja ; Keskin, Ozlem</creatorcontrib><description>A major challenge in designing proteins de novo to bind user-defined ligands with high affinity is finding backbones structures into which a new binding site geometry can be engineered with high precision. Recent advances in methods to generate protein fold families de novo have expanded the space of accessible protein structures, but it is not clear to what extend de novo proteins with diverse geometries also expand the space of designable ligand binding functions. We constructed a library of 25,806 high-quality ligand binding sites and developed a fast protocol to place ("match") these binding sites into both naturally occurring and de novo protein families with two fold topologies: Rossman and NTF2. Each matching step involves engineering new binding site residues into each protein "scaffold", which is distinct from the problem of comparing already existing binding pockets. 5,896 and 7,475 binding sites could be matched to the Rossmann and NTF2 fold families, respectively. De novo designed Rossman and NTF2 protein families can support 1,791 and 678 binding sites that cannot be matched to naturally existing structures with the same topologies, respectively. While the number of protein residues in ligand binding sites is the major determinant of matching success, ligand size and primary sequence separation of binding site residues also play important roles. The number of matched binding sites are power law functions of the number of members in a fold family. Our results suggest that de novo sampling of geometric variations on diverse fold topologies can significantly expand the space of designable ligand binding sites for a wealth of possible new protein functions.</description><identifier>ISSN: 1553-7358</identifier><identifier>ISSN: 1553-734X</identifier><identifier>EISSN: 1553-7358</identifier><identifier>DOI: 10.1371/journal.pcbi.1009620</identifier><identifier>PMID: 34807909</identifier><language>eng</language><publisher>United States: Public Library of Science</publisher><subject>Amino acid sequence ; Amino acids ; Binding Sites ; Binding sites (Biochemistry) ; Biology and Life Sciences ; Biosensors ; Design ; Engineering and Technology ; Geometry ; Libraries ; Ligand binding (Biochemistry) ; Ligands ; Matching ; Methods ; Protein Conformation ; Protein families ; Protein Folding ; Protein research ; Proteins ; Research and Analysis Methods ; Residues ; Topology</subject><ispartof>PLoS computational biology, 2021-11, Vol.17 (11), p.e1009620-e1009620</ispartof><rights>COPYRIGHT 2021 Public Library of Science</rights><rights>2021 Pan, Kortemme. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>2021 Pan, Kortemme 2021 Pan, Kortemme</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c5760-47f61a02cee5b956ad212e80a61b1a76c01f838c59494ddb007b5c267c86b5e93</citedby><cites>FETCH-LOGICAL-c5760-47f61a02cee5b956ad212e80a61b1a76c01f838c59494ddb007b5c267c86b5e93</cites><orcidid>0000-0002-0060-4352 ; 0000-0002-8494-680X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC8648124/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC8648124/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,727,780,784,864,885,2102,2928,23866,27924,27925,53791,53793,79600,79601</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/34807909$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><contributor>Keskin, Ozlem</contributor><creatorcontrib>Pan, Xingjie</creatorcontrib><creatorcontrib>Kortemme, Tanja</creatorcontrib><title>De novo protein fold families expand the designable ligand binding site space</title><title>PLoS computational biology</title><addtitle>PLoS Comput Biol</addtitle><description>A major challenge in designing proteins de novo to bind user-defined ligands with high affinity is finding backbones structures into which a new binding site geometry can be engineered with high precision. Recent advances in methods to generate protein fold families de novo have expanded the space of accessible protein structures, but it is not clear to what extend de novo proteins with diverse geometries also expand the space of designable ligand binding functions. We constructed a library of 25,806 high-quality ligand binding sites and developed a fast protocol to place ("match") these binding sites into both naturally occurring and de novo protein families with two fold topologies: Rossman and NTF2. Each matching step involves engineering new binding site residues into each protein "scaffold", which is distinct from the problem of comparing already existing binding pockets. 5,896 and 7,475 binding sites could be matched to the Rossmann and NTF2 fold families, respectively. De novo designed Rossman and NTF2 protein families can support 1,791 and 678 binding sites that cannot be matched to naturally existing structures with the same topologies, respectively. While the number of protein residues in ligand binding sites is the major determinant of matching success, ligand size and primary sequence separation of binding site residues also play important roles. The number of matched binding sites are power law functions of the number of members in a fold family. Our results suggest that de novo sampling of geometric variations on diverse fold topologies can significantly expand the space of designable ligand binding sites for a wealth of possible new protein functions.</description><subject>Amino acid sequence</subject><subject>Amino acids</subject><subject>Binding Sites</subject><subject>Binding sites (Biochemistry)</subject><subject>Biology and Life Sciences</subject><subject>Biosensors</subject><subject>Design</subject><subject>Engineering and Technology</subject><subject>Geometry</subject><subject>Libraries</subject><subject>Ligand binding (Biochemistry)</subject><subject>Ligands</subject><subject>Matching</subject><subject>Methods</subject><subject>Protein Conformation</subject><subject>Protein families</subject><subject>Protein Folding</subject><subject>Protein research</subject><subject>Proteins</subject><subject>Research and Analysis Methods</subject><subject>Residues</subject><subject>Topology</subject><issn>1553-7358</issn><issn>1553-734X</issn><issn>1553-7358</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><sourceid>DOA</sourceid><recordid>eNqVkktv1DAQxyMEoqXwDRBE4gKHXWzHj-SCVJXXSgUkHmfLj0nqlddO42xVvj0Om1YN6gX54NH4N__xjP5F8RyjNa4EfruN-yEov-6NdmuMUMMJelAcY8aqlahY_fBOfFQ8SWmLUA4b_rg4qmiNRIOa4-LLeyhDvIplP8QRXCjb6G3Zqp3zDlIJ170KthwvoLSQXBeU9lB6101Z7YJ1oSuTG6FMvTLwtHjUKp_g2XyfFL8-fvh59nl1_u3T5uz0fGWY4GhFRcuxQsQAMN0wrizBBGqkONZYCW4QbuuqNqyhDbVWIyQ0M4QLU3PNoKlOipcH3d7HJOdNJEk4Rg1lNZuIzYGwUW1lP7idGn7LqJz8m4hDJ9UwOuNBMkUEtqQBQQ0VlimtLK2IqXQFiBmRtd7N3fZ6B9ZAGAflF6LLl-AuZBevZM1pjQnNAq9ngSFe7iGNcueSAe9VgLif_o1wJhlHGX31D3r_dDPVqTyAC23Mfc0kKk95TRmnBPNMre-h8rGwcyYGaF3OLwreLAoyM8L12Kl9SnLz4_t_sF-XLD2wZogpDdDe7g4jOZn5Zkg5mVnOZs5lL-7u_bboxr3VHygW7l8</recordid><startdate>20211122</startdate><enddate>20211122</enddate><creator>Pan, Xingjie</creator><creator>Kortemme, Tanja</creator><general>Public Library of Science</general><general>Public Library of Science (PLoS)</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>ISN</scope><scope>ISR</scope><scope>3V.</scope><scope>7QO</scope><scope>7QP</scope><scope>7TK</scope><scope>7TM</scope><scope>7X7</scope><scope>7XB</scope><scope>88E</scope><scope>8AL</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>BHPHI</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FR3</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>K9.</scope><scope>LK8</scope><scope>M0N</scope><scope>M0S</scope><scope>M1P</scope><scope>M7P</scope><scope>P5Z</scope><scope>P62</scope><scope>P64</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>Q9U</scope><scope>RC3</scope><scope>7X8</scope><scope>5PM</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0002-0060-4352</orcidid><orcidid>https://orcid.org/0000-0002-8494-680X</orcidid></search><sort><creationdate>20211122</creationdate><title>De novo protein fold families expand the designable ligand binding site space</title><author>Pan, Xingjie ; Kortemme, Tanja</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c5760-47f61a02cee5b956ad212e80a61b1a76c01f838c59494ddb007b5c267c86b5e93</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Amino acid sequence</topic><topic>Amino acids</topic><topic>Binding Sites</topic><topic>Binding sites (Biochemistry)</topic><topic>Biology and Life Sciences</topic><topic>Biosensors</topic><topic>Design</topic><topic>Engineering and Technology</topic><topic>Geometry</topic><topic>Libraries</topic><topic>Ligand binding (Biochemistry)</topic><topic>Ligands</topic><topic>Matching</topic><topic>Methods</topic><topic>Protein Conformation</topic><topic>Protein families</topic><topic>Protein Folding</topic><topic>Protein research</topic><topic>Proteins</topic><topic>Research and Analysis Methods</topic><topic>Residues</topic><topic>Topology</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Pan, Xingjie</creatorcontrib><creatorcontrib>Kortemme, Tanja</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Gale In Context: Canada</collection><collection>Gale In Context: Science</collection><collection>ProQuest Central (Corporate)</collection><collection>Biotechnology Research Abstracts</collection><collection>Calcium & Calcified Tissue Abstracts</collection><collection>Neurosciences Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Health & Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Medical Database (Alumni Edition)</collection><collection>Computing Database (Alumni Edition)</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>Natural Science Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Engineering Research Database</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>ProQuest Health & Medical Complete (Alumni)</collection><collection>ProQuest Biological Science Collection</collection><collection>Computing Database</collection><collection>Health & Medical Collection (Alumni Edition)</collection><collection>Medical Database</collection><collection>Biological Science Database</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ProQuest Central Basic</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>PLoS computational biology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Pan, Xingjie</au><au>Kortemme, Tanja</au><au>Keskin, Ozlem</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>De novo protein fold families expand the designable ligand binding site space</atitle><jtitle>PLoS computational biology</jtitle><addtitle>PLoS Comput Biol</addtitle><date>2021-11-22</date><risdate>2021</risdate><volume>17</volume><issue>11</issue><spage>e1009620</spage><epage>e1009620</epage><pages>e1009620-e1009620</pages><issn>1553-7358</issn><issn>1553-734X</issn><eissn>1553-7358</eissn><abstract>A major challenge in designing proteins de novo to bind user-defined ligands with high affinity is finding backbones structures into which a new binding site geometry can be engineered with high precision. Recent advances in methods to generate protein fold families de novo have expanded the space of accessible protein structures, but it is not clear to what extend de novo proteins with diverse geometries also expand the space of designable ligand binding functions. We constructed a library of 25,806 high-quality ligand binding sites and developed a fast protocol to place ("match") these binding sites into both naturally occurring and de novo protein families with two fold topologies: Rossman and NTF2. Each matching step involves engineering new binding site residues into each protein "scaffold", which is distinct from the problem of comparing already existing binding pockets. 5,896 and 7,475 binding sites could be matched to the Rossmann and NTF2 fold families, respectively. De novo designed Rossman and NTF2 protein families can support 1,791 and 678 binding sites that cannot be matched to naturally existing structures with the same topologies, respectively. While the number of protein residues in ligand binding sites is the major determinant of matching success, ligand size and primary sequence separation of binding site residues also play important roles. The number of matched binding sites are power law functions of the number of members in a fold family. Our results suggest that de novo sampling of geometric variations on diverse fold topologies can significantly expand the space of designable ligand binding sites for a wealth of possible new protein functions.</abstract><cop>United States</cop><pub>Public Library of Science</pub><pmid>34807909</pmid><doi>10.1371/journal.pcbi.1009620</doi><orcidid>https://orcid.org/0000-0002-0060-4352</orcidid><orcidid>https://orcid.org/0000-0002-8494-680X</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1553-7358 |
ispartof | PLoS computational biology, 2021-11, Vol.17 (11), p.e1009620-e1009620 |
issn | 1553-7358 1553-734X 1553-7358 |
language | eng |
recordid | cdi_plos_journals_2610945859 |
source | MEDLINE; DOAJ Directory of Open Access Journals; Public Library of Science (PLoS); EZB-FREE-00999 freely available EZB journals; PubMed Central |
subjects | Amino acid sequence Amino acids Binding Sites Binding sites (Biochemistry) Biology and Life Sciences Biosensors Design Engineering and Technology Geometry Libraries Ligand binding (Biochemistry) Ligands Matching Methods Protein Conformation Protein families Protein Folding Protein research Proteins Research and Analysis Methods Residues Topology |
title | De novo protein fold families expand the designable ligand binding site space |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-06T18%3A02%3A10IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_plos_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=De%20novo%20protein%20fold%20families%20expand%20the%20designable%20ligand%20binding%20site%20space&rft.jtitle=PLoS%20computational%20biology&rft.au=Pan,%20Xingjie&rft.date=2021-11-22&rft.volume=17&rft.issue=11&rft.spage=e1009620&rft.epage=e1009620&rft.pages=e1009620-e1009620&rft.issn=1553-7358&rft.eissn=1553-7358&rft_id=info:doi/10.1371/journal.pcbi.1009620&rft_dat=%3Cgale_plos_%3EA684564216%3C/gale_plos_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2610945859&rft_id=info:pmid/34807909&rft_galeid=A684564216&rft_doaj_id=oai_doaj_org_article_5a271d29e74c47d5abad432c3b3e05c7&rfr_iscdi=true |