Neural networks to learn protein sequence–function relationships from deep mutational scanning data
The mapping from protein sequence to function is highly complex, making it challenging to predict how sequence changes will affect a protein’s behavior and properties.We present a supervised deep learning framework to learn the sequence–function mapping from deep mutational scanning data and make pr...
Gespeichert in:
Veröffentlicht in: | Proceedings of the National Academy of Sciences - PNAS 2021-11, Vol.118 (48), p.1-12 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 12 |
---|---|
container_issue | 48 |
container_start_page | 1 |
container_title | Proceedings of the National Academy of Sciences - PNAS |
container_volume | 118 |
creator | Gelman, Sam Fahlberg, Sarah A. Heinzelman, Pete Romero, Philip A. Gitter, Anthony |
description | The mapping from protein sequence to function is highly complex, making it challenging to predict how sequence changes will affect a protein’s behavior and properties.We present a supervised deep learning framework to learn the sequence–function mapping from deep mutational scanning data and make predictions for new, uncharacterized sequence variants.We test multiple neural network architectures, including a graph convolutional network that incorporates protein structure, to explore how a network’s internal representation affects its ability to learn the sequence–function mapping. Our supervised learning approach displays superior performance over physics-based and unsupervised prediction methods. We find that networks that capture nonlinear interactions and share parameters across sequence positions are important for learning the relationship between sequence and function. Further analysis of the trained models reveals the networks’ ability to learn biologically meaningful information about protein structure and mechanism. Finally, we demonstrate the models’ ability to navigate sequence space and design new proteins beyond the training set. We applied the protein G B1 domain (GB1) models to design a sequence that binds to immunoglobulin G with substantially higher affinity than wild-type GB1. |
doi_str_mv | 10.1073/pnas.2104878118 |
format | Article |
fullrecord | <record><control><sourceid>jstor_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_8640744</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><jstor_id>27094057</jstor_id><sourcerecordid>27094057</sourcerecordid><originalsourceid>FETCH-LOGICAL-c470t-fc759ac1390edf30f739d68e511e59e5284807e584d35c1d4e13d02acd0afd043</originalsourceid><addsrcrecordid>eNpdkcuOFCEUhitG47Sja1caohs3NXMooIDNJGbiLZnoRtcE4dQ0bTWUQGlmN-_gG_okVk-P7WUF4Xz8cM7XNI8pnFCQ7HSKtpx0FLiSilJ1p1lR0LTtuYa7zQqgk63iHT9qHpSyAQAtFNxvjhhXVDCmVg2-xznbkUSs31P-UkhNZESbI5lyqhgiKfh1xujw5_WPYY6uhhRJxtHuNmUdpkKGnLbEI05kO9eb8yWwOBtjiJfE22ofNvcGOxZ8dLseN59ev_p4_ra9-PDm3fnLi9ZxCbUdnBTaOso0oB8YDJJp3ysUlKLQKDrFFUgUinsmHPUcKfPQWefBDh44O27O9rnT_HmL3mGsS3NmymFr85VJNph_KzGszWX6ZlTPQfJdwLN9QCo1mOJCRbd2KUZ01VCtOqbFAr24fSWnZTalmm0oDsfRRkxzMV0PVGu-SFjQ5_-hmzTnZT43lGDAe6EW6nRPuZxKyTgcfkzB7DybnWfzx_Ny4-nfjR7432IX4Mke2JSa8qHeSdAchGS_APIxsME</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2605304658</pqid></control><display><type>article</type><title>Neural networks to learn protein sequence–function relationships from deep mutational scanning data</title><source>MEDLINE</source><source>Jstor Complete Legacy</source><source>PubMed Central</source><source>Alma/SFX Local Collection</source><source>Free Full-Text Journals in Chemistry</source><creator>Gelman, Sam ; Fahlberg, Sarah A. ; Heinzelman, Pete ; Romero, Philip A. ; Gitter, Anthony</creator><creatorcontrib>Gelman, Sam ; Fahlberg, Sarah A. ; Heinzelman, Pete ; Romero, Philip A. ; Gitter, Anthony ; Univ. of Wisconsin, Madison, WI (United States) ; Argonne National Laboratory (ANL), Argonne, IL (United States). Advanced Photon Source (APS)</creatorcontrib><description>The mapping from protein sequence to function is highly complex, making it challenging to predict how sequence changes will affect a protein’s behavior and properties.We present a supervised deep learning framework to learn the sequence–function mapping from deep mutational scanning data and make predictions for new, uncharacterized sequence variants.We test multiple neural network architectures, including a graph convolutional network that incorporates protein structure, to explore how a network’s internal representation affects its ability to learn the sequence–function mapping. Our supervised learning approach displays superior performance over physics-based and unsupervised prediction methods. We find that networks that capture nonlinear interactions and share parameters across sequence positions are important for learning the relationship between sequence and function. Further analysis of the trained models reveals the networks’ ability to learn biologically meaningful information about protein structure and mechanism. Finally, we demonstrate the models’ ability to navigate sequence space and design new proteins beyond the training set. We applied the protein G B1 domain (GB1) models to design a sequence that binds to immunoglobulin G with substantially higher affinity than wild-type GB1.</description><identifier>ISSN: 0027-8424</identifier><identifier>EISSN: 1091-6490</identifier><identifier>DOI: 10.1073/pnas.2104878118</identifier><identifier>PMID: 34815338</identifier><language>eng</language><publisher>United States: National Academy of Sciences</publisher><subject>Algorithms ; Amino acid sequence ; Amino Acid Sequence - genetics ; Amino Acid Sequence - physiology ; Artificial neural networks ; BASIC BIOLOGICAL SCIENCES ; Biochemical Phenomena ; Biological Sciences ; Computer architecture ; Deep Learning ; IgG antibody ; Immunoglobulin G ; Machine Learning ; Mutation ; Neural networks ; Neural Networks, Computer ; Peptide mapping ; Physical Sciences ; Protein G ; Protein structure ; Proteins ; Proteins - metabolism ; Scanning ; Science & Technology - Other Topics ; Sequence Analysis, Protein - methods ; Structure-Activity Relationship</subject><ispartof>Proceedings of the National Academy of Sciences - PNAS, 2021-11, Vol.118 (48), p.1-12</ispartof><rights>Copyright © 2021 the Author(s). Published by PNAS.</rights><rights>Copyright National Academy of Sciences Nov 30, 2021</rights><rights>Copyright © 2021 the Author(s). Published by PNAS. 2021</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c470t-fc759ac1390edf30f739d68e511e59e5284807e584d35c1d4e13d02acd0afd043</citedby><cites>FETCH-LOGICAL-c470t-fc759ac1390edf30f739d68e511e59e5284807e584d35c1d4e13d02acd0afd043</cites><orcidid>0000-0002-5324-9833 ; 0000-0001-9537-0976 ; 0000-0002-2586-7263 ; 0000-0002-5588-3731 ; 0000000195370976 ; 0000000253249833 ; 0000000225867263 ; 0000000255883731</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.jstor.org/stable/pdf/27094057$$EPDF$$P50$$Gjstor$$H</linktopdf><linktohtml>$$Uhttps://www.jstor.org/stable/27094057$$EHTML$$P50$$Gjstor$$H</linktohtml><link.rule.ids>230,314,724,777,781,800,882,27905,27906,53772,53774,57998,58231</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/34815338$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink><backlink>$$Uhttps://www.osti.gov/servlets/purl/1982395$$D View this record in Osti.gov$$Hfree_for_read</backlink></links><search><creatorcontrib>Gelman, Sam</creatorcontrib><creatorcontrib>Fahlberg, Sarah A.</creatorcontrib><creatorcontrib>Heinzelman, Pete</creatorcontrib><creatorcontrib>Romero, Philip A.</creatorcontrib><creatorcontrib>Gitter, Anthony</creatorcontrib><creatorcontrib>Univ. of Wisconsin, Madison, WI (United States)</creatorcontrib><creatorcontrib>Argonne National Laboratory (ANL), Argonne, IL (United States). Advanced Photon Source (APS)</creatorcontrib><title>Neural networks to learn protein sequence–function relationships from deep mutational scanning data</title><title>Proceedings of the National Academy of Sciences - PNAS</title><addtitle>Proc Natl Acad Sci U S A</addtitle><description>The mapping from protein sequence to function is highly complex, making it challenging to predict how sequence changes will affect a protein’s behavior and properties.We present a supervised deep learning framework to learn the sequence–function mapping from deep mutational scanning data and make predictions for new, uncharacterized sequence variants.We test multiple neural network architectures, including a graph convolutional network that incorporates protein structure, to explore how a network’s internal representation affects its ability to learn the sequence–function mapping. Our supervised learning approach displays superior performance over physics-based and unsupervised prediction methods. We find that networks that capture nonlinear interactions and share parameters across sequence positions are important for learning the relationship between sequence and function. Further analysis of the trained models reveals the networks’ ability to learn biologically meaningful information about protein structure and mechanism. Finally, we demonstrate the models’ ability to navigate sequence space and design new proteins beyond the training set. We applied the protein G B1 domain (GB1) models to design a sequence that binds to immunoglobulin G with substantially higher affinity than wild-type GB1.</description><subject>Algorithms</subject><subject>Amino acid sequence</subject><subject>Amino Acid Sequence - genetics</subject><subject>Amino Acid Sequence - physiology</subject><subject>Artificial neural networks</subject><subject>BASIC BIOLOGICAL SCIENCES</subject><subject>Biochemical Phenomena</subject><subject>Biological Sciences</subject><subject>Computer architecture</subject><subject>Deep Learning</subject><subject>IgG antibody</subject><subject>Immunoglobulin G</subject><subject>Machine Learning</subject><subject>Mutation</subject><subject>Neural networks</subject><subject>Neural Networks, Computer</subject><subject>Peptide mapping</subject><subject>Physical Sciences</subject><subject>Protein G</subject><subject>Protein structure</subject><subject>Proteins</subject><subject>Proteins - metabolism</subject><subject>Scanning</subject><subject>Science & Technology - Other Topics</subject><subject>Sequence Analysis, Protein - methods</subject><subject>Structure-Activity Relationship</subject><issn>0027-8424</issn><issn>1091-6490</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNpdkcuOFCEUhitG47Sja1caohs3NXMooIDNJGbiLZnoRtcE4dQ0bTWUQGlmN-_gG_okVk-P7WUF4Xz8cM7XNI8pnFCQ7HSKtpx0FLiSilJ1p1lR0LTtuYa7zQqgk63iHT9qHpSyAQAtFNxvjhhXVDCmVg2-xznbkUSs31P-UkhNZESbI5lyqhgiKfh1xujw5_WPYY6uhhRJxtHuNmUdpkKGnLbEI05kO9eb8yWwOBtjiJfE22ofNvcGOxZ8dLseN59ev_p4_ra9-PDm3fnLi9ZxCbUdnBTaOso0oB8YDJJp3ysUlKLQKDrFFUgUinsmHPUcKfPQWefBDh44O27O9rnT_HmL3mGsS3NmymFr85VJNph_KzGszWX6ZlTPQfJdwLN9QCo1mOJCRbd2KUZ01VCtOqbFAr24fSWnZTalmm0oDsfRRkxzMV0PVGu-SFjQ5_-hmzTnZT43lGDAe6EW6nRPuZxKyTgcfkzB7DybnWfzx_Ny4-nfjR7432IX4Mke2JSa8qHeSdAchGS_APIxsME</recordid><startdate>20211130</startdate><enddate>20211130</enddate><creator>Gelman, Sam</creator><creator>Fahlberg, Sarah A.</creator><creator>Heinzelman, Pete</creator><creator>Romero, Philip A.</creator><creator>Gitter, Anthony</creator><general>National Academy of Sciences</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QG</scope><scope>7QL</scope><scope>7QP</scope><scope>7QR</scope><scope>7SN</scope><scope>7SS</scope><scope>7T5</scope><scope>7TK</scope><scope>7TM</scope><scope>7TO</scope><scope>7U9</scope><scope>8FD</scope><scope>C1K</scope><scope>FR3</scope><scope>H94</scope><scope>M7N</scope><scope>P64</scope><scope>RC3</scope><scope>7X8</scope><scope>OIOZB</scope><scope>OTOTI</scope><scope>5PM</scope><orcidid>https://orcid.org/0000-0002-5324-9833</orcidid><orcidid>https://orcid.org/0000-0001-9537-0976</orcidid><orcidid>https://orcid.org/0000-0002-2586-7263</orcidid><orcidid>https://orcid.org/0000-0002-5588-3731</orcidid><orcidid>https://orcid.org/0000000195370976</orcidid><orcidid>https://orcid.org/0000000253249833</orcidid><orcidid>https://orcid.org/0000000225867263</orcidid><orcidid>https://orcid.org/0000000255883731</orcidid></search><sort><creationdate>20211130</creationdate><title>Neural networks to learn protein sequence–function relationships from deep mutational scanning data</title><author>Gelman, Sam ; Fahlberg, Sarah A. ; Heinzelman, Pete ; Romero, Philip A. ; Gitter, Anthony</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c470t-fc759ac1390edf30f739d68e511e59e5284807e584d35c1d4e13d02acd0afd043</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Algorithms</topic><topic>Amino acid sequence</topic><topic>Amino Acid Sequence - genetics</topic><topic>Amino Acid Sequence - physiology</topic><topic>Artificial neural networks</topic><topic>BASIC BIOLOGICAL SCIENCES</topic><topic>Biochemical Phenomena</topic><topic>Biological Sciences</topic><topic>Computer architecture</topic><topic>Deep Learning</topic><topic>IgG antibody</topic><topic>Immunoglobulin G</topic><topic>Machine Learning</topic><topic>Mutation</topic><topic>Neural networks</topic><topic>Neural Networks, Computer</topic><topic>Peptide mapping</topic><topic>Physical Sciences</topic><topic>Protein G</topic><topic>Protein structure</topic><topic>Proteins</topic><topic>Proteins - metabolism</topic><topic>Scanning</topic><topic>Science & Technology - Other Topics</topic><topic>Sequence Analysis, Protein - methods</topic><topic>Structure-Activity Relationship</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Gelman, Sam</creatorcontrib><creatorcontrib>Fahlberg, Sarah A.</creatorcontrib><creatorcontrib>Heinzelman, Pete</creatorcontrib><creatorcontrib>Romero, Philip A.</creatorcontrib><creatorcontrib>Gitter, Anthony</creatorcontrib><creatorcontrib>Univ. of Wisconsin, Madison, WI (United States)</creatorcontrib><creatorcontrib>Argonne National Laboratory (ANL), Argonne, IL (United States). Advanced Photon Source (APS)</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Animal Behavior Abstracts</collection><collection>Bacteriology Abstracts (Microbiology B)</collection><collection>Calcium & Calcified Tissue Abstracts</collection><collection>Chemoreception Abstracts</collection><collection>Ecology Abstracts</collection><collection>Entomology Abstracts (Full archive)</collection><collection>Immunology Abstracts</collection><collection>Neurosciences Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Oncogenes and Growth Factors Abstracts</collection><collection>Virology and AIDS Abstracts</collection><collection>Technology Research Database</collection><collection>Environmental Sciences and Pollution Management</collection><collection>Engineering Research Database</collection><collection>AIDS and Cancer Research Abstracts</collection><collection>Algology Mycology and Protozoology Abstracts (Microbiology C)</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><collection>OSTI.GOV - Hybrid</collection><collection>OSTI.GOV</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Proceedings of the National Academy of Sciences - PNAS</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Gelman, Sam</au><au>Fahlberg, Sarah A.</au><au>Heinzelman, Pete</au><au>Romero, Philip A.</au><au>Gitter, Anthony</au><aucorp>Univ. of Wisconsin, Madison, WI (United States)</aucorp><aucorp>Argonne National Laboratory (ANL), Argonne, IL (United States). Advanced Photon Source (APS)</aucorp><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Neural networks to learn protein sequence–function relationships from deep mutational scanning data</atitle><jtitle>Proceedings of the National Academy of Sciences - PNAS</jtitle><addtitle>Proc Natl Acad Sci U S A</addtitle><date>2021-11-30</date><risdate>2021</risdate><volume>118</volume><issue>48</issue><spage>1</spage><epage>12</epage><pages>1-12</pages><issn>0027-8424</issn><eissn>1091-6490</eissn><abstract>The mapping from protein sequence to function is highly complex, making it challenging to predict how sequence changes will affect a protein’s behavior and properties.We present a supervised deep learning framework to learn the sequence–function mapping from deep mutational scanning data and make predictions for new, uncharacterized sequence variants.We test multiple neural network architectures, including a graph convolutional network that incorporates protein structure, to explore how a network’s internal representation affects its ability to learn the sequence–function mapping. Our supervised learning approach displays superior performance over physics-based and unsupervised prediction methods. We find that networks that capture nonlinear interactions and share parameters across sequence positions are important for learning the relationship between sequence and function. Further analysis of the trained models reveals the networks’ ability to learn biologically meaningful information about protein structure and mechanism. Finally, we demonstrate the models’ ability to navigate sequence space and design new proteins beyond the training set. We applied the protein G B1 domain (GB1) models to design a sequence that binds to immunoglobulin G with substantially higher affinity than wild-type GB1.</abstract><cop>United States</cop><pub>National Academy of Sciences</pub><pmid>34815338</pmid><doi>10.1073/pnas.2104878118</doi><tpages>12</tpages><orcidid>https://orcid.org/0000-0002-5324-9833</orcidid><orcidid>https://orcid.org/0000-0001-9537-0976</orcidid><orcidid>https://orcid.org/0000-0002-2586-7263</orcidid><orcidid>https://orcid.org/0000-0002-5588-3731</orcidid><orcidid>https://orcid.org/0000000195370976</orcidid><orcidid>https://orcid.org/0000000253249833</orcidid><orcidid>https://orcid.org/0000000225867263</orcidid><orcidid>https://orcid.org/0000000255883731</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0027-8424 |
ispartof | Proceedings of the National Academy of Sciences - PNAS, 2021-11, Vol.118 (48), p.1-12 |
issn | 0027-8424 1091-6490 |
language | eng |
recordid | cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_8640744 |
source | MEDLINE; Jstor Complete Legacy; PubMed Central; Alma/SFX Local Collection; Free Full-Text Journals in Chemistry |
subjects | Algorithms Amino acid sequence Amino Acid Sequence - genetics Amino Acid Sequence - physiology Artificial neural networks BASIC BIOLOGICAL SCIENCES Biochemical Phenomena Biological Sciences Computer architecture Deep Learning IgG antibody Immunoglobulin G Machine Learning Mutation Neural networks Neural Networks, Computer Peptide mapping Physical Sciences Protein G Protein structure Proteins Proteins - metabolism Scanning Science & Technology - Other Topics Sequence Analysis, Protein - methods Structure-Activity Relationship |
title | Neural networks to learn protein sequence–function relationships from deep mutational scanning data |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-18T14%3A10%3A32IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-jstor_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Neural%20networks%20to%20learn%20protein%20sequence%E2%80%93function%20relationships%20from%20deep%20mutational%20scanning%20data&rft.jtitle=Proceedings%20of%20the%20National%20Academy%20of%20Sciences%20-%20PNAS&rft.au=Gelman,%20Sam&rft.aucorp=Univ.%20of%20Wisconsin,%20Madison,%20WI%20(United%20States)&rft.date=2021-11-30&rft.volume=118&rft.issue=48&rft.spage=1&rft.epage=12&rft.pages=1-12&rft.issn=0027-8424&rft.eissn=1091-6490&rft_id=info:doi/10.1073/pnas.2104878118&rft_dat=%3Cjstor_pubme%3E27094057%3C/jstor_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2605304658&rft_id=info:pmid/34815338&rft_jstor_id=27094057&rfr_iscdi=true |