Neural networks to learn protein sequence–function relationships from deep mutational scanning data

The mapping from protein sequence to function is highly complex, making it challenging to predict how sequence changes will affect a protein’s behavior and properties.We present a supervised deep learning framework to learn the sequence–function mapping from deep mutational scanning data and make pr...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Proceedings of the National Academy of Sciences - PNAS 2021-11, Vol.118 (48), p.1-12
Hauptverfasser: Gelman, Sam, Fahlberg, Sarah A., Heinzelman, Pete, Romero, Philip A., Gitter, Anthony
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 12
container_issue 48
container_start_page 1
container_title Proceedings of the National Academy of Sciences - PNAS
container_volume 118
creator Gelman, Sam
Fahlberg, Sarah A.
Heinzelman, Pete
Romero, Philip A.
Gitter, Anthony
description The mapping from protein sequence to function is highly complex, making it challenging to predict how sequence changes will affect a protein’s behavior and properties.We present a supervised deep learning framework to learn the sequence–function mapping from deep mutational scanning data and make predictions for new, uncharacterized sequence variants.We test multiple neural network architectures, including a graph convolutional network that incorporates protein structure, to explore how a network’s internal representation affects its ability to learn the sequence–function mapping. Our supervised learning approach displays superior performance over physics-based and unsupervised prediction methods. We find that networks that capture nonlinear interactions and share parameters across sequence positions are important for learning the relationship between sequence and function. Further analysis of the trained models reveals the networks’ ability to learn biologically meaningful information about protein structure and mechanism. Finally, we demonstrate the models’ ability to navigate sequence space and design new proteins beyond the training set. We applied the protein G B1 domain (GB1) models to design a sequence that binds to immunoglobulin G with substantially higher affinity than wild-type GB1.
doi_str_mv 10.1073/pnas.2104878118
format Article
fullrecord <record><control><sourceid>jstor_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_8640744</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><jstor_id>27094057</jstor_id><sourcerecordid>27094057</sourcerecordid><originalsourceid>FETCH-LOGICAL-c470t-fc759ac1390edf30f739d68e511e59e5284807e584d35c1d4e13d02acd0afd043</originalsourceid><addsrcrecordid>eNpdkcuOFCEUhitG47Sja1caohs3NXMooIDNJGbiLZnoRtcE4dQ0bTWUQGlmN-_gG_okVk-P7WUF4Xz8cM7XNI8pnFCQ7HSKtpx0FLiSilJ1p1lR0LTtuYa7zQqgk63iHT9qHpSyAQAtFNxvjhhXVDCmVg2-xznbkUSs31P-UkhNZESbI5lyqhgiKfh1xujw5_WPYY6uhhRJxtHuNmUdpkKGnLbEI05kO9eb8yWwOBtjiJfE22ofNvcGOxZ8dLseN59ev_p4_ra9-PDm3fnLi9ZxCbUdnBTaOso0oB8YDJJp3ysUlKLQKDrFFUgUinsmHPUcKfPQWefBDh44O27O9rnT_HmL3mGsS3NmymFr85VJNph_KzGszWX6ZlTPQfJdwLN9QCo1mOJCRbd2KUZ01VCtOqbFAr24fSWnZTalmm0oDsfRRkxzMV0PVGu-SFjQ5_-hmzTnZT43lGDAe6EW6nRPuZxKyTgcfkzB7DybnWfzx_Ny4-nfjR7432IX4Mke2JSa8qHeSdAchGS_APIxsME</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2605304658</pqid></control><display><type>article</type><title>Neural networks to learn protein sequence–function relationships from deep mutational scanning data</title><source>MEDLINE</source><source>Jstor Complete Legacy</source><source>PubMed Central</source><source>Alma/SFX Local Collection</source><source>Free Full-Text Journals in Chemistry</source><creator>Gelman, Sam ; Fahlberg, Sarah A. ; Heinzelman, Pete ; Romero, Philip A. ; Gitter, Anthony</creator><creatorcontrib>Gelman, Sam ; Fahlberg, Sarah A. ; Heinzelman, Pete ; Romero, Philip A. ; Gitter, Anthony ; Univ. of Wisconsin, Madison, WI (United States) ; Argonne National Laboratory (ANL), Argonne, IL (United States). Advanced Photon Source (APS)</creatorcontrib><description>The mapping from protein sequence to function is highly complex, making it challenging to predict how sequence changes will affect a protein’s behavior and properties.We present a supervised deep learning framework to learn the sequence–function mapping from deep mutational scanning data and make predictions for new, uncharacterized sequence variants.We test multiple neural network architectures, including a graph convolutional network that incorporates protein structure, to explore how a network’s internal representation affects its ability to learn the sequence–function mapping. Our supervised learning approach displays superior performance over physics-based and unsupervised prediction methods. We find that networks that capture nonlinear interactions and share parameters across sequence positions are important for learning the relationship between sequence and function. Further analysis of the trained models reveals the networks’ ability to learn biologically meaningful information about protein structure and mechanism. Finally, we demonstrate the models’ ability to navigate sequence space and design new proteins beyond the training set. We applied the protein G B1 domain (GB1) models to design a sequence that binds to immunoglobulin G with substantially higher affinity than wild-type GB1.</description><identifier>ISSN: 0027-8424</identifier><identifier>EISSN: 1091-6490</identifier><identifier>DOI: 10.1073/pnas.2104878118</identifier><identifier>PMID: 34815338</identifier><language>eng</language><publisher>United States: National Academy of Sciences</publisher><subject>Algorithms ; Amino acid sequence ; Amino Acid Sequence - genetics ; Amino Acid Sequence - physiology ; Artificial neural networks ; BASIC BIOLOGICAL SCIENCES ; Biochemical Phenomena ; Biological Sciences ; Computer architecture ; Deep Learning ; IgG antibody ; Immunoglobulin G ; Machine Learning ; Mutation ; Neural networks ; Neural Networks, Computer ; Peptide mapping ; Physical Sciences ; Protein G ; Protein structure ; Proteins ; Proteins - metabolism ; Scanning ; Science &amp; Technology - Other Topics ; Sequence Analysis, Protein - methods ; Structure-Activity Relationship</subject><ispartof>Proceedings of the National Academy of Sciences - PNAS, 2021-11, Vol.118 (48), p.1-12</ispartof><rights>Copyright © 2021 the Author(s). Published by PNAS.</rights><rights>Copyright National Academy of Sciences Nov 30, 2021</rights><rights>Copyright © 2021 the Author(s). Published by PNAS. 2021</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c470t-fc759ac1390edf30f739d68e511e59e5284807e584d35c1d4e13d02acd0afd043</citedby><cites>FETCH-LOGICAL-c470t-fc759ac1390edf30f739d68e511e59e5284807e584d35c1d4e13d02acd0afd043</cites><orcidid>0000-0002-5324-9833 ; 0000-0001-9537-0976 ; 0000-0002-2586-7263 ; 0000-0002-5588-3731 ; 0000000195370976 ; 0000000253249833 ; 0000000225867263 ; 0000000255883731</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.jstor.org/stable/pdf/27094057$$EPDF$$P50$$Gjstor$$H</linktopdf><linktohtml>$$Uhttps://www.jstor.org/stable/27094057$$EHTML$$P50$$Gjstor$$H</linktohtml><link.rule.ids>230,314,724,777,781,800,882,27905,27906,53772,53774,57998,58231</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/34815338$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink><backlink>$$Uhttps://www.osti.gov/servlets/purl/1982395$$D View this record in Osti.gov$$Hfree_for_read</backlink></links><search><creatorcontrib>Gelman, Sam</creatorcontrib><creatorcontrib>Fahlberg, Sarah A.</creatorcontrib><creatorcontrib>Heinzelman, Pete</creatorcontrib><creatorcontrib>Romero, Philip A.</creatorcontrib><creatorcontrib>Gitter, Anthony</creatorcontrib><creatorcontrib>Univ. of Wisconsin, Madison, WI (United States)</creatorcontrib><creatorcontrib>Argonne National Laboratory (ANL), Argonne, IL (United States). Advanced Photon Source (APS)</creatorcontrib><title>Neural networks to learn protein sequence–function relationships from deep mutational scanning data</title><title>Proceedings of the National Academy of Sciences - PNAS</title><addtitle>Proc Natl Acad Sci U S A</addtitle><description>The mapping from protein sequence to function is highly complex, making it challenging to predict how sequence changes will affect a protein’s behavior and properties.We present a supervised deep learning framework to learn the sequence–function mapping from deep mutational scanning data and make predictions for new, uncharacterized sequence variants.We test multiple neural network architectures, including a graph convolutional network that incorporates protein structure, to explore how a network’s internal representation affects its ability to learn the sequence–function mapping. Our supervised learning approach displays superior performance over physics-based and unsupervised prediction methods. We find that networks that capture nonlinear interactions and share parameters across sequence positions are important for learning the relationship between sequence and function. Further analysis of the trained models reveals the networks’ ability to learn biologically meaningful information about protein structure and mechanism. Finally, we demonstrate the models’ ability to navigate sequence space and design new proteins beyond the training set. We applied the protein G B1 domain (GB1) models to design a sequence that binds to immunoglobulin G with substantially higher affinity than wild-type GB1.</description><subject>Algorithms</subject><subject>Amino acid sequence</subject><subject>Amino Acid Sequence - genetics</subject><subject>Amino Acid Sequence - physiology</subject><subject>Artificial neural networks</subject><subject>BASIC BIOLOGICAL SCIENCES</subject><subject>Biochemical Phenomena</subject><subject>Biological Sciences</subject><subject>Computer architecture</subject><subject>Deep Learning</subject><subject>IgG antibody</subject><subject>Immunoglobulin G</subject><subject>Machine Learning</subject><subject>Mutation</subject><subject>Neural networks</subject><subject>Neural Networks, Computer</subject><subject>Peptide mapping</subject><subject>Physical Sciences</subject><subject>Protein G</subject><subject>Protein structure</subject><subject>Proteins</subject><subject>Proteins - metabolism</subject><subject>Scanning</subject><subject>Science &amp; Technology - Other Topics</subject><subject>Sequence Analysis, Protein - methods</subject><subject>Structure-Activity Relationship</subject><issn>0027-8424</issn><issn>1091-6490</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNpdkcuOFCEUhitG47Sja1caohs3NXMooIDNJGbiLZnoRtcE4dQ0bTWUQGlmN-_gG_okVk-P7WUF4Xz8cM7XNI8pnFCQ7HSKtpx0FLiSilJ1p1lR0LTtuYa7zQqgk63iHT9qHpSyAQAtFNxvjhhXVDCmVg2-xznbkUSs31P-UkhNZESbI5lyqhgiKfh1xujw5_WPYY6uhhRJxtHuNmUdpkKGnLbEI05kO9eb8yWwOBtjiJfE22ofNvcGOxZ8dLseN59ev_p4_ra9-PDm3fnLi9ZxCbUdnBTaOso0oB8YDJJp3ysUlKLQKDrFFUgUinsmHPUcKfPQWefBDh44O27O9rnT_HmL3mGsS3NmymFr85VJNph_KzGszWX6ZlTPQfJdwLN9QCo1mOJCRbd2KUZ01VCtOqbFAr24fSWnZTalmm0oDsfRRkxzMV0PVGu-SFjQ5_-hmzTnZT43lGDAe6EW6nRPuZxKyTgcfkzB7DybnWfzx_Ny4-nfjR7432IX4Mke2JSa8qHeSdAchGS_APIxsME</recordid><startdate>20211130</startdate><enddate>20211130</enddate><creator>Gelman, Sam</creator><creator>Fahlberg, Sarah A.</creator><creator>Heinzelman, Pete</creator><creator>Romero, Philip A.</creator><creator>Gitter, Anthony</creator><general>National Academy of Sciences</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QG</scope><scope>7QL</scope><scope>7QP</scope><scope>7QR</scope><scope>7SN</scope><scope>7SS</scope><scope>7T5</scope><scope>7TK</scope><scope>7TM</scope><scope>7TO</scope><scope>7U9</scope><scope>8FD</scope><scope>C1K</scope><scope>FR3</scope><scope>H94</scope><scope>M7N</scope><scope>P64</scope><scope>RC3</scope><scope>7X8</scope><scope>OIOZB</scope><scope>OTOTI</scope><scope>5PM</scope><orcidid>https://orcid.org/0000-0002-5324-9833</orcidid><orcidid>https://orcid.org/0000-0001-9537-0976</orcidid><orcidid>https://orcid.org/0000-0002-2586-7263</orcidid><orcidid>https://orcid.org/0000-0002-5588-3731</orcidid><orcidid>https://orcid.org/0000000195370976</orcidid><orcidid>https://orcid.org/0000000253249833</orcidid><orcidid>https://orcid.org/0000000225867263</orcidid><orcidid>https://orcid.org/0000000255883731</orcidid></search><sort><creationdate>20211130</creationdate><title>Neural networks to learn protein sequence–function relationships from deep mutational scanning data</title><author>Gelman, Sam ; Fahlberg, Sarah A. ; Heinzelman, Pete ; Romero, Philip A. ; Gitter, Anthony</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c470t-fc759ac1390edf30f739d68e511e59e5284807e584d35c1d4e13d02acd0afd043</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Algorithms</topic><topic>Amino acid sequence</topic><topic>Amino Acid Sequence - genetics</topic><topic>Amino Acid Sequence - physiology</topic><topic>Artificial neural networks</topic><topic>BASIC BIOLOGICAL SCIENCES</topic><topic>Biochemical Phenomena</topic><topic>Biological Sciences</topic><topic>Computer architecture</topic><topic>Deep Learning</topic><topic>IgG antibody</topic><topic>Immunoglobulin G</topic><topic>Machine Learning</topic><topic>Mutation</topic><topic>Neural networks</topic><topic>Neural Networks, Computer</topic><topic>Peptide mapping</topic><topic>Physical Sciences</topic><topic>Protein G</topic><topic>Protein structure</topic><topic>Proteins</topic><topic>Proteins - metabolism</topic><topic>Scanning</topic><topic>Science &amp; Technology - Other Topics</topic><topic>Sequence Analysis, Protein - methods</topic><topic>Structure-Activity Relationship</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Gelman, Sam</creatorcontrib><creatorcontrib>Fahlberg, Sarah A.</creatorcontrib><creatorcontrib>Heinzelman, Pete</creatorcontrib><creatorcontrib>Romero, Philip A.</creatorcontrib><creatorcontrib>Gitter, Anthony</creatorcontrib><creatorcontrib>Univ. of Wisconsin, Madison, WI (United States)</creatorcontrib><creatorcontrib>Argonne National Laboratory (ANL), Argonne, IL (United States). Advanced Photon Source (APS)</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Animal Behavior Abstracts</collection><collection>Bacteriology Abstracts (Microbiology B)</collection><collection>Calcium &amp; Calcified Tissue Abstracts</collection><collection>Chemoreception Abstracts</collection><collection>Ecology Abstracts</collection><collection>Entomology Abstracts (Full archive)</collection><collection>Immunology Abstracts</collection><collection>Neurosciences Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Oncogenes and Growth Factors Abstracts</collection><collection>Virology and AIDS Abstracts</collection><collection>Technology Research Database</collection><collection>Environmental Sciences and Pollution Management</collection><collection>Engineering Research Database</collection><collection>AIDS and Cancer Research Abstracts</collection><collection>Algology Mycology and Protozoology Abstracts (Microbiology C)</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><collection>OSTI.GOV - Hybrid</collection><collection>OSTI.GOV</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Proceedings of the National Academy of Sciences - PNAS</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Gelman, Sam</au><au>Fahlberg, Sarah A.</au><au>Heinzelman, Pete</au><au>Romero, Philip A.</au><au>Gitter, Anthony</au><aucorp>Univ. of Wisconsin, Madison, WI (United States)</aucorp><aucorp>Argonne National Laboratory (ANL), Argonne, IL (United States). Advanced Photon Source (APS)</aucorp><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Neural networks to learn protein sequence–function relationships from deep mutational scanning data</atitle><jtitle>Proceedings of the National Academy of Sciences - PNAS</jtitle><addtitle>Proc Natl Acad Sci U S A</addtitle><date>2021-11-30</date><risdate>2021</risdate><volume>118</volume><issue>48</issue><spage>1</spage><epage>12</epage><pages>1-12</pages><issn>0027-8424</issn><eissn>1091-6490</eissn><abstract>The mapping from protein sequence to function is highly complex, making it challenging to predict how sequence changes will affect a protein’s behavior and properties.We present a supervised deep learning framework to learn the sequence–function mapping from deep mutational scanning data and make predictions for new, uncharacterized sequence variants.We test multiple neural network architectures, including a graph convolutional network that incorporates protein structure, to explore how a network’s internal representation affects its ability to learn the sequence–function mapping. Our supervised learning approach displays superior performance over physics-based and unsupervised prediction methods. We find that networks that capture nonlinear interactions and share parameters across sequence positions are important for learning the relationship between sequence and function. Further analysis of the trained models reveals the networks’ ability to learn biologically meaningful information about protein structure and mechanism. Finally, we demonstrate the models’ ability to navigate sequence space and design new proteins beyond the training set. We applied the protein G B1 domain (GB1) models to design a sequence that binds to immunoglobulin G with substantially higher affinity than wild-type GB1.</abstract><cop>United States</cop><pub>National Academy of Sciences</pub><pmid>34815338</pmid><doi>10.1073/pnas.2104878118</doi><tpages>12</tpages><orcidid>https://orcid.org/0000-0002-5324-9833</orcidid><orcidid>https://orcid.org/0000-0001-9537-0976</orcidid><orcidid>https://orcid.org/0000-0002-2586-7263</orcidid><orcidid>https://orcid.org/0000-0002-5588-3731</orcidid><orcidid>https://orcid.org/0000000195370976</orcidid><orcidid>https://orcid.org/0000000253249833</orcidid><orcidid>https://orcid.org/0000000225867263</orcidid><orcidid>https://orcid.org/0000000255883731</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0027-8424
ispartof Proceedings of the National Academy of Sciences - PNAS, 2021-11, Vol.118 (48), p.1-12
issn 0027-8424
1091-6490
language eng
recordid cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_8640744
source MEDLINE; Jstor Complete Legacy; PubMed Central; Alma/SFX Local Collection; Free Full-Text Journals in Chemistry
subjects Algorithms
Amino acid sequence
Amino Acid Sequence - genetics
Amino Acid Sequence - physiology
Artificial neural networks
BASIC BIOLOGICAL SCIENCES
Biochemical Phenomena
Biological Sciences
Computer architecture
Deep Learning
IgG antibody
Immunoglobulin G
Machine Learning
Mutation
Neural networks
Neural Networks, Computer
Peptide mapping
Physical Sciences
Protein G
Protein structure
Proteins
Proteins - metabolism
Scanning
Science & Technology - Other Topics
Sequence Analysis, Protein - methods
Structure-Activity Relationship
title Neural networks to learn protein sequence–function relationships from deep mutational scanning data
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-18T14%3A10%3A32IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-jstor_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Neural%20networks%20to%20learn%20protein%20sequence%E2%80%93function%20relationships%20from%20deep%20mutational%20scanning%20data&rft.jtitle=Proceedings%20of%20the%20National%20Academy%20of%20Sciences%20-%20PNAS&rft.au=Gelman,%20Sam&rft.aucorp=Univ.%20of%20Wisconsin,%20Madison,%20WI%20(United%20States)&rft.date=2021-11-30&rft.volume=118&rft.issue=48&rft.spage=1&rft.epage=12&rft.pages=1-12&rft.issn=0027-8424&rft.eissn=1091-6490&rft_id=info:doi/10.1073/pnas.2104878118&rft_dat=%3Cjstor_pubme%3E27094057%3C/jstor_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2605304658&rft_id=info:pmid/34815338&rft_jstor_id=27094057&rfr_iscdi=true