De novo protein fold families expand the designable ligand binding site space

A major challenge in designing proteins de novo to bind user-defined ligands with high affinity is finding backbones structures into which a new binding site geometry can be engineered with high precision. Recent advances in methods to generate protein fold families de novo have expanded the space o...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:PLoS computational biology 2021-11, Vol.17 (11), p.e1009620-e1009620
Hauptverfasser: Pan, Xingjie, Kortemme, Tanja
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page e1009620
container_issue 11
container_start_page e1009620
container_title PLoS computational biology
container_volume 17
creator Pan, Xingjie
Kortemme, Tanja
description A major challenge in designing proteins de novo to bind user-defined ligands with high affinity is finding backbones structures into which a new binding site geometry can be engineered with high precision. Recent advances in methods to generate protein fold families de novo have expanded the space of accessible protein structures, but it is not clear to what extend de novo proteins with diverse geometries also expand the space of designable ligand binding functions. We constructed a library of 25,806 high-quality ligand binding sites and developed a fast protocol to place ("match") these binding sites into both naturally occurring and de novo protein families with two fold topologies: Rossman and NTF2. Each matching step involves engineering new binding site residues into each protein "scaffold", which is distinct from the problem of comparing already existing binding pockets. 5,896 and 7,475 binding sites could be matched to the Rossmann and NTF2 fold families, respectively. De novo designed Rossman and NTF2 protein families can support 1,791 and 678 binding sites that cannot be matched to naturally existing structures with the same topologies, respectively. While the number of protein residues in ligand binding sites is the major determinant of matching success, ligand size and primary sequence separation of binding site residues also play important roles. The number of matched binding sites are power law functions of the number of members in a fold family. Our results suggest that de novo sampling of geometric variations on diverse fold topologies can significantly expand the space of designable ligand binding sites for a wealth of possible new protein functions.
doi_str_mv 10.1371/journal.pcbi.1009620
format Article
fullrecord <record><control><sourceid>gale_plos_</sourceid><recordid>TN_cdi_plos_journals_2610945859</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A684564216</galeid><doaj_id>oai_doaj_org_article_5a271d29e74c47d5abad432c3b3e05c7</doaj_id><sourcerecordid>A684564216</sourcerecordid><originalsourceid>FETCH-LOGICAL-c5760-47f61a02cee5b956ad212e80a61b1a76c01f838c59494ddb007b5c267c86b5e93</originalsourceid><addsrcrecordid>eNqVkktv1DAQxyMEoqXwDRBE4gKHXWzHj-SCVJXXSgUkHmfLj0nqlddO42xVvj0Om1YN6gX54NH4N__xjP5F8RyjNa4EfruN-yEov-6NdmuMUMMJelAcY8aqlahY_fBOfFQ8SWmLUA4b_rg4qmiNRIOa4-LLeyhDvIplP8QRXCjb6G3Zqp3zDlIJ170KthwvoLSQXBeU9lB6101Z7YJ1oSuTG6FMvTLwtHjUKp_g2XyfFL8-fvh59nl1_u3T5uz0fGWY4GhFRcuxQsQAMN0wrizBBGqkONZYCW4QbuuqNqyhDbVWIyQ0M4QLU3PNoKlOipcH3d7HJOdNJEk4Rg1lNZuIzYGwUW1lP7idGn7LqJz8m4hDJ9UwOuNBMkUEtqQBQQ0VlimtLK2IqXQFiBmRtd7N3fZ6B9ZAGAflF6LLl-AuZBevZM1pjQnNAq9ngSFe7iGNcueSAe9VgLif_o1wJhlHGX31D3r_dDPVqTyAC23Mfc0kKk95TRmnBPNMre-h8rGwcyYGaF3OLwreLAoyM8L12Kl9SnLz4_t_sF-XLD2wZogpDdDe7g4jOZn5Zkg5mVnOZs5lL-7u_bboxr3VHygW7l8</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2610945859</pqid></control><display><type>article</type><title>De novo protein fold families expand the designable ligand binding site space</title><source>MEDLINE</source><source>DOAJ Directory of Open Access Journals</source><source>Public Library of Science (PLoS)</source><source>EZB-FREE-00999 freely available EZB journals</source><source>PubMed Central</source><creator>Pan, Xingjie ; Kortemme, Tanja</creator><contributor>Keskin, Ozlem</contributor><creatorcontrib>Pan, Xingjie ; Kortemme, Tanja ; Keskin, Ozlem</creatorcontrib><description>A major challenge in designing proteins de novo to bind user-defined ligands with high affinity is finding backbones structures into which a new binding site geometry can be engineered with high precision. Recent advances in methods to generate protein fold families de novo have expanded the space of accessible protein structures, but it is not clear to what extend de novo proteins with diverse geometries also expand the space of designable ligand binding functions. We constructed a library of 25,806 high-quality ligand binding sites and developed a fast protocol to place ("match") these binding sites into both naturally occurring and de novo protein families with two fold topologies: Rossman and NTF2. Each matching step involves engineering new binding site residues into each protein "scaffold", which is distinct from the problem of comparing already existing binding pockets. 5,896 and 7,475 binding sites could be matched to the Rossmann and NTF2 fold families, respectively. De novo designed Rossman and NTF2 protein families can support 1,791 and 678 binding sites that cannot be matched to naturally existing structures with the same topologies, respectively. While the number of protein residues in ligand binding sites is the major determinant of matching success, ligand size and primary sequence separation of binding site residues also play important roles. The number of matched binding sites are power law functions of the number of members in a fold family. Our results suggest that de novo sampling of geometric variations on diverse fold topologies can significantly expand the space of designable ligand binding sites for a wealth of possible new protein functions.</description><identifier>ISSN: 1553-7358</identifier><identifier>ISSN: 1553-734X</identifier><identifier>EISSN: 1553-7358</identifier><identifier>DOI: 10.1371/journal.pcbi.1009620</identifier><identifier>PMID: 34807909</identifier><language>eng</language><publisher>United States: Public Library of Science</publisher><subject>Amino acid sequence ; Amino acids ; Binding Sites ; Binding sites (Biochemistry) ; Biology and Life Sciences ; Biosensors ; Design ; Engineering and Technology ; Geometry ; Libraries ; Ligand binding (Biochemistry) ; Ligands ; Matching ; Methods ; Protein Conformation ; Protein families ; Protein Folding ; Protein research ; Proteins ; Research and Analysis Methods ; Residues ; Topology</subject><ispartof>PLoS computational biology, 2021-11, Vol.17 (11), p.e1009620-e1009620</ispartof><rights>COPYRIGHT 2021 Public Library of Science</rights><rights>2021 Pan, Kortemme. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>2021 Pan, Kortemme 2021 Pan, Kortemme</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c5760-47f61a02cee5b956ad212e80a61b1a76c01f838c59494ddb007b5c267c86b5e93</citedby><cites>FETCH-LOGICAL-c5760-47f61a02cee5b956ad212e80a61b1a76c01f838c59494ddb007b5c267c86b5e93</cites><orcidid>0000-0002-0060-4352 ; 0000-0002-8494-680X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC8648124/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC8648124/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,727,780,784,864,885,2102,2928,23866,27924,27925,53791,53793,79600,79601</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/34807909$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><contributor>Keskin, Ozlem</contributor><creatorcontrib>Pan, Xingjie</creatorcontrib><creatorcontrib>Kortemme, Tanja</creatorcontrib><title>De novo protein fold families expand the designable ligand binding site space</title><title>PLoS computational biology</title><addtitle>PLoS Comput Biol</addtitle><description>A major challenge in designing proteins de novo to bind user-defined ligands with high affinity is finding backbones structures into which a new binding site geometry can be engineered with high precision. Recent advances in methods to generate protein fold families de novo have expanded the space of accessible protein structures, but it is not clear to what extend de novo proteins with diverse geometries also expand the space of designable ligand binding functions. We constructed a library of 25,806 high-quality ligand binding sites and developed a fast protocol to place ("match") these binding sites into both naturally occurring and de novo protein families with two fold topologies: Rossman and NTF2. Each matching step involves engineering new binding site residues into each protein "scaffold", which is distinct from the problem of comparing already existing binding pockets. 5,896 and 7,475 binding sites could be matched to the Rossmann and NTF2 fold families, respectively. De novo designed Rossman and NTF2 protein families can support 1,791 and 678 binding sites that cannot be matched to naturally existing structures with the same topologies, respectively. While the number of protein residues in ligand binding sites is the major determinant of matching success, ligand size and primary sequence separation of binding site residues also play important roles. The number of matched binding sites are power law functions of the number of members in a fold family. Our results suggest that de novo sampling of geometric variations on diverse fold topologies can significantly expand the space of designable ligand binding sites for a wealth of possible new protein functions.</description><subject>Amino acid sequence</subject><subject>Amino acids</subject><subject>Binding Sites</subject><subject>Binding sites (Biochemistry)</subject><subject>Biology and Life Sciences</subject><subject>Biosensors</subject><subject>Design</subject><subject>Engineering and Technology</subject><subject>Geometry</subject><subject>Libraries</subject><subject>Ligand binding (Biochemistry)</subject><subject>Ligands</subject><subject>Matching</subject><subject>Methods</subject><subject>Protein Conformation</subject><subject>Protein families</subject><subject>Protein Folding</subject><subject>Protein research</subject><subject>Proteins</subject><subject>Research and Analysis Methods</subject><subject>Residues</subject><subject>Topology</subject><issn>1553-7358</issn><issn>1553-734X</issn><issn>1553-7358</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><sourceid>DOA</sourceid><recordid>eNqVkktv1DAQxyMEoqXwDRBE4gKHXWzHj-SCVJXXSgUkHmfLj0nqlddO42xVvj0Om1YN6gX54NH4N__xjP5F8RyjNa4EfruN-yEov-6NdmuMUMMJelAcY8aqlahY_fBOfFQ8SWmLUA4b_rg4qmiNRIOa4-LLeyhDvIplP8QRXCjb6G3Zqp3zDlIJ170KthwvoLSQXBeU9lB6101Z7YJ1oSuTG6FMvTLwtHjUKp_g2XyfFL8-fvh59nl1_u3T5uz0fGWY4GhFRcuxQsQAMN0wrizBBGqkONZYCW4QbuuqNqyhDbVWIyQ0M4QLU3PNoKlOipcH3d7HJOdNJEk4Rg1lNZuIzYGwUW1lP7idGn7LqJz8m4hDJ9UwOuNBMkUEtqQBQQ0VlimtLK2IqXQFiBmRtd7N3fZ6B9ZAGAflF6LLl-AuZBevZM1pjQnNAq9ngSFe7iGNcueSAe9VgLif_o1wJhlHGX31D3r_dDPVqTyAC23Mfc0kKk95TRmnBPNMre-h8rGwcyYGaF3OLwreLAoyM8L12Kl9SnLz4_t_sF-XLD2wZogpDdDe7g4jOZn5Zkg5mVnOZs5lL-7u_bboxr3VHygW7l8</recordid><startdate>20211122</startdate><enddate>20211122</enddate><creator>Pan, Xingjie</creator><creator>Kortemme, Tanja</creator><general>Public Library of Science</general><general>Public Library of Science (PLoS)</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>ISN</scope><scope>ISR</scope><scope>3V.</scope><scope>7QO</scope><scope>7QP</scope><scope>7TK</scope><scope>7TM</scope><scope>7X7</scope><scope>7XB</scope><scope>88E</scope><scope>8AL</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>BHPHI</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FR3</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>K9.</scope><scope>LK8</scope><scope>M0N</scope><scope>M0S</scope><scope>M1P</scope><scope>M7P</scope><scope>P5Z</scope><scope>P62</scope><scope>P64</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>Q9U</scope><scope>RC3</scope><scope>7X8</scope><scope>5PM</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0002-0060-4352</orcidid><orcidid>https://orcid.org/0000-0002-8494-680X</orcidid></search><sort><creationdate>20211122</creationdate><title>De novo protein fold families expand the designable ligand binding site space</title><author>Pan, Xingjie ; Kortemme, Tanja</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c5760-47f61a02cee5b956ad212e80a61b1a76c01f838c59494ddb007b5c267c86b5e93</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Amino acid sequence</topic><topic>Amino acids</topic><topic>Binding Sites</topic><topic>Binding sites (Biochemistry)</topic><topic>Biology and Life Sciences</topic><topic>Biosensors</topic><topic>Design</topic><topic>Engineering and Technology</topic><topic>Geometry</topic><topic>Libraries</topic><topic>Ligand binding (Biochemistry)</topic><topic>Ligands</topic><topic>Matching</topic><topic>Methods</topic><topic>Protein Conformation</topic><topic>Protein families</topic><topic>Protein Folding</topic><topic>Protein research</topic><topic>Proteins</topic><topic>Research and Analysis Methods</topic><topic>Residues</topic><topic>Topology</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Pan, Xingjie</creatorcontrib><creatorcontrib>Kortemme, Tanja</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Gale In Context: Canada</collection><collection>Gale In Context: Science</collection><collection>ProQuest Central (Corporate)</collection><collection>Biotechnology Research Abstracts</collection><collection>Calcium &amp; Calcified Tissue Abstracts</collection><collection>Neurosciences Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Health &amp; Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Medical Database (Alumni Edition)</collection><collection>Computing Database (Alumni Edition)</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>Natural Science Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Engineering Research Database</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>ProQuest Biological Science Collection</collection><collection>Computing Database</collection><collection>Health &amp; Medical Collection (Alumni Edition)</collection><collection>Medical Database</collection><collection>Biological Science Database</collection><collection>Advanced Technologies &amp; Aerospace Database</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ProQuest Central Basic</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>PLoS computational biology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Pan, Xingjie</au><au>Kortemme, Tanja</au><au>Keskin, Ozlem</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>De novo protein fold families expand the designable ligand binding site space</atitle><jtitle>PLoS computational biology</jtitle><addtitle>PLoS Comput Biol</addtitle><date>2021-11-22</date><risdate>2021</risdate><volume>17</volume><issue>11</issue><spage>e1009620</spage><epage>e1009620</epage><pages>e1009620-e1009620</pages><issn>1553-7358</issn><issn>1553-734X</issn><eissn>1553-7358</eissn><abstract>A major challenge in designing proteins de novo to bind user-defined ligands with high affinity is finding backbones structures into which a new binding site geometry can be engineered with high precision. Recent advances in methods to generate protein fold families de novo have expanded the space of accessible protein structures, but it is not clear to what extend de novo proteins with diverse geometries also expand the space of designable ligand binding functions. We constructed a library of 25,806 high-quality ligand binding sites and developed a fast protocol to place ("match") these binding sites into both naturally occurring and de novo protein families with two fold topologies: Rossman and NTF2. Each matching step involves engineering new binding site residues into each protein "scaffold", which is distinct from the problem of comparing already existing binding pockets. 5,896 and 7,475 binding sites could be matched to the Rossmann and NTF2 fold families, respectively. De novo designed Rossman and NTF2 protein families can support 1,791 and 678 binding sites that cannot be matched to naturally existing structures with the same topologies, respectively. While the number of protein residues in ligand binding sites is the major determinant of matching success, ligand size and primary sequence separation of binding site residues also play important roles. The number of matched binding sites are power law functions of the number of members in a fold family. Our results suggest that de novo sampling of geometric variations on diverse fold topologies can significantly expand the space of designable ligand binding sites for a wealth of possible new protein functions.</abstract><cop>United States</cop><pub>Public Library of Science</pub><pmid>34807909</pmid><doi>10.1371/journal.pcbi.1009620</doi><orcidid>https://orcid.org/0000-0002-0060-4352</orcidid><orcidid>https://orcid.org/0000-0002-8494-680X</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1553-7358
ispartof PLoS computational biology, 2021-11, Vol.17 (11), p.e1009620-e1009620
issn 1553-7358
1553-734X
1553-7358
language eng
recordid cdi_plos_journals_2610945859
source MEDLINE; DOAJ Directory of Open Access Journals; Public Library of Science (PLoS); EZB-FREE-00999 freely available EZB journals; PubMed Central
subjects Amino acid sequence
Amino acids
Binding Sites
Binding sites (Biochemistry)
Biology and Life Sciences
Biosensors
Design
Engineering and Technology
Geometry
Libraries
Ligand binding (Biochemistry)
Ligands
Matching
Methods
Protein Conformation
Protein families
Protein Folding
Protein research
Proteins
Research and Analysis Methods
Residues
Topology
title De novo protein fold families expand the designable ligand binding site space
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-06T18%3A02%3A10IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_plos_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=De%20novo%20protein%20fold%20families%20expand%20the%20designable%20ligand%20binding%20site%20space&rft.jtitle=PLoS%20computational%20biology&rft.au=Pan,%20Xingjie&rft.date=2021-11-22&rft.volume=17&rft.issue=11&rft.spage=e1009620&rft.epage=e1009620&rft.pages=e1009620-e1009620&rft.issn=1553-7358&rft.eissn=1553-7358&rft_id=info:doi/10.1371/journal.pcbi.1009620&rft_dat=%3Cgale_plos_%3EA684564216%3C/gale_plos_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2610945859&rft_id=info:pmid/34807909&rft_galeid=A684564216&rft_doaj_id=oai_doaj_org_article_5a271d29e74c47d5abad432c3b3e05c7&rfr_iscdi=true