Large-scale design and refinement of stable proteins using sequence-only models
Engineered proteins generally must possess a stable structure in order to achieve their designed function. Stable designs, however, are astronomically rare within the space of all possible amino acid sequences. As a consequence, many designs must be tested computationally and experimentally in order...
Gespeichert in:
Veröffentlicht in: | PloS one 2022-03, Vol.17 (3), p.e0265020 |
---|---|
Hauptverfasser: | , , , , , , , , , , , , , , , , , , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | 3 |
container_start_page | e0265020 |
container_title | PloS one |
container_volume | 17 |
creator | Singer, Jedediah M Novotney, Scott Strickland, Devin Haddox, Hugh K Leiby, Nicholas Rocklin, Gabriel J Chow, Cameron M Roy, Anindya Bera, Asim K Motta, Francis C Cao, Longxing Strauch, Eva-Maria Chidyausiku, Tamuka M Ford, Alex Ho, Ethan Zaitzeff, Alexander Mackenzie, Craig O Eramian, Hamed DiMaio, Frank Grigoryan, Gevorg Vaughn, Matthew Stewart, Lance J Baker, David Klavins, Eric |
description | Engineered proteins generally must possess a stable structure in order to achieve their designed function. Stable designs, however, are astronomically rare within the space of all possible amino acid sequences. As a consequence, many designs must be tested computationally and experimentally in order to find stable ones, which is expensive in terms of time and resources. Here we use a high-throughput, low-fidelity assay to experimentally evaluate the stability of approximately 200,000 novel proteins. These include a wide range of sequence perturbations, providing a baseline for future work in the field. We build a neural network model that predicts protein stability given only sequences of amino acids, and compare its performance to the assayed values. We also report another network model that is able to generate the amino acid sequences of novel stable proteins given requested secondary sequences. Finally, we show that the predictive model-despite weaknesses including a noisy data set-can be used to substantially increase the stability of both expert-designed and model-generated proteins. |
doi_str_mv | 10.1371/journal.pone.0265020 |
format | Article |
fullrecord | <record><control><sourceid>gale_plos_</sourceid><recordid>TN_cdi_plos_journals_2638936796</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A696721642</galeid><doaj_id>oai_doaj_org_article_6bc977c80731465da8bb65f2719bfb0c</doaj_id><sourcerecordid>A696721642</sourcerecordid><originalsourceid>FETCH-LOGICAL-c692t-a012f8ec48676217f67ca7b486d7d6753d92d6861e6f23856b9d6a717f021bb03</originalsourceid><addsrcrecordid>eNqNkluL1DAYhoso7rr6D0QLguhFxxzaL-2NsCweBgYGPN2GNEk7GTLJbNKK--_N7HSXqeyF5CKn53uTvHmz7CVGC0wZ_rD1Y3DCLvbe6QUiUCGCHmXnuKGkAILo45PxWfYsxi1CFa0BnmZntCI1UFKeZ-uVCL0uohRW50pH07tcOJUH3Rmnd9oNue_yOIg27e-DH7RxMR-jcX0e9fWondSFd_Ym33mlbXyePemEjfrF1F9kPz9_-nH1tVitvyyvLleFhIYMhUCYdLWWZQ0MCGYdMClYm6aKKWAVVQ1RUAPW0BFaV9A2CgRLICK4bRG9yF4fdffWRz55ETkBWjcUWAOJWB4J5cWW74PZiXDDvTD8dsGHnoswGGk1h1Y2jMkaMYpLqJSo2xaqjjDctF2LZNL6OJ02tjutZLIlCDsTne84s-G9_83rhiDCyiTwbhIIPpkWB74zUWprhdN-vL13Q0iJMUvom3_Qh183UX36OG5c59O58iDKL6EBRjCUJFGLB6jUlN4ZmYLTmbQ-K3g_K0jMoP8MvRhj5Mvv3_6fXf-as29P2I0WdthEb8fBeBfnYHkEZfAxphTem4wRP-T-zg1-yD2fcp_KXp1-0H3RXdDpX1TI_Do</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2638936796</pqid></control><display><type>article</type><title>Large-scale design and refinement of stable proteins using sequence-only models</title><source>MEDLINE</source><source>DOAJ Directory of Open Access Journals</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><source>PubMed Central</source><source>Free Full-Text Journals in Chemistry</source><source>Public Library of Science (PLoS)</source><creator>Singer, Jedediah M ; Novotney, Scott ; Strickland, Devin ; Haddox, Hugh K ; Leiby, Nicholas ; Rocklin, Gabriel J ; Chow, Cameron M ; Roy, Anindya ; Bera, Asim K ; Motta, Francis C ; Cao, Longxing ; Strauch, Eva-Maria ; Chidyausiku, Tamuka M ; Ford, Alex ; Ho, Ethan ; Zaitzeff, Alexander ; Mackenzie, Craig O ; Eramian, Hamed ; DiMaio, Frank ; Grigoryan, Gevorg ; Vaughn, Matthew ; Stewart, Lance J ; Baker, David ; Klavins, Eric</creator><creatorcontrib>Singer, Jedediah M ; Novotney, Scott ; Strickland, Devin ; Haddox, Hugh K ; Leiby, Nicholas ; Rocklin, Gabriel J ; Chow, Cameron M ; Roy, Anindya ; Bera, Asim K ; Motta, Francis C ; Cao, Longxing ; Strauch, Eva-Maria ; Chidyausiku, Tamuka M ; Ford, Alex ; Ho, Ethan ; Zaitzeff, Alexander ; Mackenzie, Craig O ; Eramian, Hamed ; DiMaio, Frank ; Grigoryan, Gevorg ; Vaughn, Matthew ; Stewart, Lance J ; Baker, David ; Klavins, Eric</creatorcontrib><description>Engineered proteins generally must possess a stable structure in order to achieve their designed function. Stable designs, however, are astronomically rare within the space of all possible amino acid sequences. As a consequence, many designs must be tested computationally and experimentally in order to find stable ones, which is expensive in terms of time and resources. Here we use a high-throughput, low-fidelity assay to experimentally evaluate the stability of approximately 200,000 novel proteins. These include a wide range of sequence perturbations, providing a baseline for future work in the field. We build a neural network model that predicts protein stability given only sequences of amino acids, and compare its performance to the assayed values. We also report another network model that is able to generate the amino acid sequences of novel stable proteins given requested secondary sequences. Finally, we show that the predictive model-despite weaknesses including a noisy data set-can be used to substantially increase the stability of both expert-designed and model-generated proteins.</description><identifier>ISSN: 1932-6203</identifier><identifier>EISSN: 1932-6203</identifier><identifier>DOI: 10.1371/journal.pone.0265020</identifier><identifier>PMID: 35286324</identifier><language>eng</language><publisher>United States: Public Library of Science</publisher><subject>Accuracy ; Amino Acid Sequence ; Amino Acids ; Automation ; Biochemistry ; Biology and Life Sciences ; Computer engineering ; Datasets ; Design ; Machine learning ; Methods ; Modelling ; Mutation ; Neural networks ; Neural Networks, Computer ; Perturbation ; Physical Sciences ; Physics ; Prediction models ; Protein Stability ; Proteins ; Proteins - chemistry ; Recombinant proteins ; Research and Analysis Methods ; Stability analysis</subject><ispartof>PloS one, 2022-03, Vol.17 (3), p.e0265020</ispartof><rights>COPYRIGHT 2022 Public Library of Science</rights><rights>2022 Singer et al. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>2022 Singer et al 2022 Singer et al</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c692t-a012f8ec48676217f67ca7b486d7d6753d92d6861e6f23856b9d6a717f021bb03</citedby><cites>FETCH-LOGICAL-c692t-a012f8ec48676217f67ca7b486d7d6753d92d6861e6f23856b9d6a717f021bb03</cites><orcidid>0000-0001-5351-6412 ; 0000-0002-1384-4283 ; 0000-0002-2561-782X ; 0000-0002-7070-3321 ; 0000-0002-0364-5440 ; 0000-0002-5255-4473</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC8920274/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC8920274/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,723,776,780,860,881,2096,2915,23845,27901,27902,53766,53768,79343,79344</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/35286324$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Singer, Jedediah M</creatorcontrib><creatorcontrib>Novotney, Scott</creatorcontrib><creatorcontrib>Strickland, Devin</creatorcontrib><creatorcontrib>Haddox, Hugh K</creatorcontrib><creatorcontrib>Leiby, Nicholas</creatorcontrib><creatorcontrib>Rocklin, Gabriel J</creatorcontrib><creatorcontrib>Chow, Cameron M</creatorcontrib><creatorcontrib>Roy, Anindya</creatorcontrib><creatorcontrib>Bera, Asim K</creatorcontrib><creatorcontrib>Motta, Francis C</creatorcontrib><creatorcontrib>Cao, Longxing</creatorcontrib><creatorcontrib>Strauch, Eva-Maria</creatorcontrib><creatorcontrib>Chidyausiku, Tamuka M</creatorcontrib><creatorcontrib>Ford, Alex</creatorcontrib><creatorcontrib>Ho, Ethan</creatorcontrib><creatorcontrib>Zaitzeff, Alexander</creatorcontrib><creatorcontrib>Mackenzie, Craig O</creatorcontrib><creatorcontrib>Eramian, Hamed</creatorcontrib><creatorcontrib>DiMaio, Frank</creatorcontrib><creatorcontrib>Grigoryan, Gevorg</creatorcontrib><creatorcontrib>Vaughn, Matthew</creatorcontrib><creatorcontrib>Stewart, Lance J</creatorcontrib><creatorcontrib>Baker, David</creatorcontrib><creatorcontrib>Klavins, Eric</creatorcontrib><title>Large-scale design and refinement of stable proteins using sequence-only models</title><title>PloS one</title><addtitle>PLoS One</addtitle><description>Engineered proteins generally must possess a stable structure in order to achieve their designed function. Stable designs, however, are astronomically rare within the space of all possible amino acid sequences. As a consequence, many designs must be tested computationally and experimentally in order to find stable ones, which is expensive in terms of time and resources. Here we use a high-throughput, low-fidelity assay to experimentally evaluate the stability of approximately 200,000 novel proteins. These include a wide range of sequence perturbations, providing a baseline for future work in the field. We build a neural network model that predicts protein stability given only sequences of amino acids, and compare its performance to the assayed values. We also report another network model that is able to generate the amino acid sequences of novel stable proteins given requested secondary sequences. Finally, we show that the predictive model-despite weaknesses including a noisy data set-can be used to substantially increase the stability of both expert-designed and model-generated proteins.</description><subject>Accuracy</subject><subject>Amino Acid Sequence</subject><subject>Amino Acids</subject><subject>Automation</subject><subject>Biochemistry</subject><subject>Biology and Life Sciences</subject><subject>Computer engineering</subject><subject>Datasets</subject><subject>Design</subject><subject>Machine learning</subject><subject>Methods</subject><subject>Modelling</subject><subject>Mutation</subject><subject>Neural networks</subject><subject>Neural Networks, Computer</subject><subject>Perturbation</subject><subject>Physical Sciences</subject><subject>Physics</subject><subject>Prediction models</subject><subject>Protein Stability</subject><subject>Proteins</subject><subject>Proteins - chemistry</subject><subject>Recombinant proteins</subject><subject>Research and Analysis Methods</subject><subject>Stability analysis</subject><issn>1932-6203</issn><issn>1932-6203</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><sourceid>BENPR</sourceid><sourceid>DOA</sourceid><recordid>eNqNkluL1DAYhoso7rr6D0QLguhFxxzaL-2NsCweBgYGPN2GNEk7GTLJbNKK--_N7HSXqeyF5CKn53uTvHmz7CVGC0wZ_rD1Y3DCLvbe6QUiUCGCHmXnuKGkAILo45PxWfYsxi1CFa0BnmZntCI1UFKeZ-uVCL0uohRW50pH07tcOJUH3Rmnd9oNue_yOIg27e-DH7RxMR-jcX0e9fWondSFd_Ym33mlbXyePemEjfrF1F9kPz9_-nH1tVitvyyvLleFhIYMhUCYdLWWZQ0MCGYdMClYm6aKKWAVVQ1RUAPW0BFaV9A2CgRLICK4bRG9yF4fdffWRz55ETkBWjcUWAOJWB4J5cWW74PZiXDDvTD8dsGHnoswGGk1h1Y2jMkaMYpLqJSo2xaqjjDctF2LZNL6OJ02tjutZLIlCDsTne84s-G9_83rhiDCyiTwbhIIPpkWB74zUWprhdN-vL13Q0iJMUvom3_Qh183UX36OG5c59O58iDKL6EBRjCUJFGLB6jUlN4ZmYLTmbQ-K3g_K0jMoP8MvRhj5Mvv3_6fXf-as29P2I0WdthEb8fBeBfnYHkEZfAxphTem4wRP-T-zg1-yD2fcp_KXp1-0H3RXdDpX1TI_Do</recordid><startdate>20220314</startdate><enddate>20220314</enddate><creator>Singer, Jedediah M</creator><creator>Novotney, Scott</creator><creator>Strickland, Devin</creator><creator>Haddox, Hugh K</creator><creator>Leiby, Nicholas</creator><creator>Rocklin, Gabriel J</creator><creator>Chow, Cameron M</creator><creator>Roy, Anindya</creator><creator>Bera, Asim K</creator><creator>Motta, Francis C</creator><creator>Cao, Longxing</creator><creator>Strauch, Eva-Maria</creator><creator>Chidyausiku, Tamuka M</creator><creator>Ford, Alex</creator><creator>Ho, Ethan</creator><creator>Zaitzeff, Alexander</creator><creator>Mackenzie, Craig O</creator><creator>Eramian, Hamed</creator><creator>DiMaio, Frank</creator><creator>Grigoryan, Gevorg</creator><creator>Vaughn, Matthew</creator><creator>Stewart, Lance J</creator><creator>Baker, David</creator><creator>Klavins, Eric</creator><general>Public Library of Science</general><general>Public Library of Science (PLoS)</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>IOV</scope><scope>ISR</scope><scope>3V.</scope><scope>7QG</scope><scope>7QL</scope><scope>7QO</scope><scope>7RV</scope><scope>7SN</scope><scope>7SS</scope><scope>7T5</scope><scope>7TG</scope><scope>7TM</scope><scope>7U9</scope><scope>7X2</scope><scope>7X7</scope><scope>7XB</scope><scope>88E</scope><scope>8AO</scope><scope>8C1</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AEUYN</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>ATCPS</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>BHPHI</scope><scope>C1K</scope><scope>CCPQU</scope><scope>COVID</scope><scope>D1I</scope><scope>DWQXO</scope><scope>FR3</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>H94</scope><scope>HCIFZ</scope><scope>K9.</scope><scope>KB.</scope><scope>KB0</scope><scope>KL.</scope><scope>L6V</scope><scope>LK8</scope><scope>M0K</scope><scope>M0S</scope><scope>M1P</scope><scope>M7N</scope><scope>M7P</scope><scope>M7S</scope><scope>NAPCQ</scope><scope>P5Z</scope><scope>P62</scope><scope>P64</scope><scope>PATMY</scope><scope>PDBOC</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope><scope>PYCSY</scope><scope>RC3</scope><scope>7X8</scope><scope>5PM</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0001-5351-6412</orcidid><orcidid>https://orcid.org/0000-0002-1384-4283</orcidid><orcidid>https://orcid.org/0000-0002-2561-782X</orcidid><orcidid>https://orcid.org/0000-0002-7070-3321</orcidid><orcidid>https://orcid.org/0000-0002-0364-5440</orcidid><orcidid>https://orcid.org/0000-0002-5255-4473</orcidid></search><sort><creationdate>20220314</creationdate><title>Large-scale design and refinement of stable proteins using sequence-only models</title><author>Singer, Jedediah M ; Novotney, Scott ; Strickland, Devin ; Haddox, Hugh K ; Leiby, Nicholas ; Rocklin, Gabriel J ; Chow, Cameron M ; Roy, Anindya ; Bera, Asim K ; Motta, Francis C ; Cao, Longxing ; Strauch, Eva-Maria ; Chidyausiku, Tamuka M ; Ford, Alex ; Ho, Ethan ; Zaitzeff, Alexander ; Mackenzie, Craig O ; Eramian, Hamed ; DiMaio, Frank ; Grigoryan, Gevorg ; Vaughn, Matthew ; Stewart, Lance J ; Baker, David ; Klavins, Eric</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c692t-a012f8ec48676217f67ca7b486d7d6753d92d6861e6f23856b9d6a717f021bb03</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Accuracy</topic><topic>Amino Acid Sequence</topic><topic>Amino Acids</topic><topic>Automation</topic><topic>Biochemistry</topic><topic>Biology and Life Sciences</topic><topic>Computer engineering</topic><topic>Datasets</topic><topic>Design</topic><topic>Machine learning</topic><topic>Methods</topic><topic>Modelling</topic><topic>Mutation</topic><topic>Neural networks</topic><topic>Neural Networks, Computer</topic><topic>Perturbation</topic><topic>Physical Sciences</topic><topic>Physics</topic><topic>Prediction models</topic><topic>Protein Stability</topic><topic>Proteins</topic><topic>Proteins - chemistry</topic><topic>Recombinant proteins</topic><topic>Research and Analysis Methods</topic><topic>Stability analysis</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Singer, Jedediah M</creatorcontrib><creatorcontrib>Novotney, Scott</creatorcontrib><creatorcontrib>Strickland, Devin</creatorcontrib><creatorcontrib>Haddox, Hugh K</creatorcontrib><creatorcontrib>Leiby, Nicholas</creatorcontrib><creatorcontrib>Rocklin, Gabriel J</creatorcontrib><creatorcontrib>Chow, Cameron M</creatorcontrib><creatorcontrib>Roy, Anindya</creatorcontrib><creatorcontrib>Bera, Asim K</creatorcontrib><creatorcontrib>Motta, Francis C</creatorcontrib><creatorcontrib>Cao, Longxing</creatorcontrib><creatorcontrib>Strauch, Eva-Maria</creatorcontrib><creatorcontrib>Chidyausiku, Tamuka M</creatorcontrib><creatorcontrib>Ford, Alex</creatorcontrib><creatorcontrib>Ho, Ethan</creatorcontrib><creatorcontrib>Zaitzeff, Alexander</creatorcontrib><creatorcontrib>Mackenzie, Craig O</creatorcontrib><creatorcontrib>Eramian, Hamed</creatorcontrib><creatorcontrib>DiMaio, Frank</creatorcontrib><creatorcontrib>Grigoryan, Gevorg</creatorcontrib><creatorcontrib>Vaughn, Matthew</creatorcontrib><creatorcontrib>Stewart, Lance J</creatorcontrib><creatorcontrib>Baker, David</creatorcontrib><creatorcontrib>Klavins, Eric</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Gale In Context: Opposing Viewpoints</collection><collection>Gale In Context: Science</collection><collection>ProQuest Central (Corporate)</collection><collection>Animal Behavior Abstracts</collection><collection>Bacteriology Abstracts (Microbiology B)</collection><collection>Biotechnology Research Abstracts</collection><collection>Nursing & Allied Health Database</collection><collection>Ecology Abstracts</collection><collection>Entomology Abstracts (Full archive)</collection><collection>Immunology Abstracts</collection><collection>Meteorological & Geoastrophysical Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Virology and AIDS Abstracts</collection><collection>Agricultural Science Collection</collection><collection>Health & Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Medical Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Public Health Database</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest One Sustainability</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>Agricultural & Environmental Science Collection</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>Natural Science Collection</collection><collection>Environmental Sciences and Pollution Management</collection><collection>ProQuest One Community College</collection><collection>Coronavirus Research Database</collection><collection>ProQuest Materials Science Collection</collection><collection>ProQuest Central Korea</collection><collection>Engineering Research Database</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>AIDS and Cancer Research Abstracts</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Health & Medical Complete (Alumni)</collection><collection>Materials Science Database</collection><collection>Nursing & Allied Health Database (Alumni Edition)</collection><collection>Meteorological & Geoastrophysical Abstracts - Academic</collection><collection>ProQuest Engineering Collection</collection><collection>ProQuest Biological Science Collection</collection><collection>Agricultural Science Database</collection><collection>Health & Medical Collection (Alumni Edition)</collection><collection>Medical Database</collection><collection>Algology Mycology and Protozoology Abstracts (Microbiology C)</collection><collection>Biological Science Database</collection><collection>Engineering Database</collection><collection>Nursing & Allied Health Premium</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Environmental Science Database</collection><collection>Materials Science Collection</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection><collection>Environmental Science Collection</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>PloS one</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Singer, Jedediah M</au><au>Novotney, Scott</au><au>Strickland, Devin</au><au>Haddox, Hugh K</au><au>Leiby, Nicholas</au><au>Rocklin, Gabriel J</au><au>Chow, Cameron M</au><au>Roy, Anindya</au><au>Bera, Asim K</au><au>Motta, Francis C</au><au>Cao, Longxing</au><au>Strauch, Eva-Maria</au><au>Chidyausiku, Tamuka M</au><au>Ford, Alex</au><au>Ho, Ethan</au><au>Zaitzeff, Alexander</au><au>Mackenzie, Craig O</au><au>Eramian, Hamed</au><au>DiMaio, Frank</au><au>Grigoryan, Gevorg</au><au>Vaughn, Matthew</au><au>Stewart, Lance J</au><au>Baker, David</au><au>Klavins, Eric</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Large-scale design and refinement of stable proteins using sequence-only models</atitle><jtitle>PloS one</jtitle><addtitle>PLoS One</addtitle><date>2022-03-14</date><risdate>2022</risdate><volume>17</volume><issue>3</issue><spage>e0265020</spage><pages>e0265020-</pages><issn>1932-6203</issn><eissn>1932-6203</eissn><abstract>Engineered proteins generally must possess a stable structure in order to achieve their designed function. Stable designs, however, are astronomically rare within the space of all possible amino acid sequences. As a consequence, many designs must be tested computationally and experimentally in order to find stable ones, which is expensive in terms of time and resources. Here we use a high-throughput, low-fidelity assay to experimentally evaluate the stability of approximately 200,000 novel proteins. These include a wide range of sequence perturbations, providing a baseline for future work in the field. We build a neural network model that predicts protein stability given only sequences of amino acids, and compare its performance to the assayed values. We also report another network model that is able to generate the amino acid sequences of novel stable proteins given requested secondary sequences. Finally, we show that the predictive model-despite weaknesses including a noisy data set-can be used to substantially increase the stability of both expert-designed and model-generated proteins.</abstract><cop>United States</cop><pub>Public Library of Science</pub><pmid>35286324</pmid><doi>10.1371/journal.pone.0265020</doi><tpages>e0265020</tpages><orcidid>https://orcid.org/0000-0001-5351-6412</orcidid><orcidid>https://orcid.org/0000-0002-1384-4283</orcidid><orcidid>https://orcid.org/0000-0002-2561-782X</orcidid><orcidid>https://orcid.org/0000-0002-7070-3321</orcidid><orcidid>https://orcid.org/0000-0002-0364-5440</orcidid><orcidid>https://orcid.org/0000-0002-5255-4473</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1932-6203 |
ispartof | PloS one, 2022-03, Vol.17 (3), p.e0265020 |
issn | 1932-6203 1932-6203 |
language | eng |
recordid | cdi_plos_journals_2638936796 |
source | MEDLINE; DOAJ Directory of Open Access Journals; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals; PubMed Central; Free Full-Text Journals in Chemistry; Public Library of Science (PLoS) |
subjects | Accuracy Amino Acid Sequence Amino Acids Automation Biochemistry Biology and Life Sciences Computer engineering Datasets Design Machine learning Methods Modelling Mutation Neural networks Neural Networks, Computer Perturbation Physical Sciences Physics Prediction models Protein Stability Proteins Proteins - chemistry Recombinant proteins Research and Analysis Methods Stability analysis |
title | Large-scale design and refinement of stable proteins using sequence-only models |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-03T02%3A47%3A06IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_plos_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Large-scale%20design%20and%20refinement%20of%20stable%20proteins%20using%20sequence-only%20models&rft.jtitle=PloS%20one&rft.au=Singer,%20Jedediah%20M&rft.date=2022-03-14&rft.volume=17&rft.issue=3&rft.spage=e0265020&rft.pages=e0265020-&rft.issn=1932-6203&rft.eissn=1932-6203&rft_id=info:doi/10.1371/journal.pone.0265020&rft_dat=%3Cgale_plos_%3EA696721642%3C/gale_plos_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2638936796&rft_id=info:pmid/35286324&rft_galeid=A696721642&rft_doaj_id=oai_doaj_org_article_6bc977c80731465da8bb65f2719bfb0c&rfr_iscdi=true |