Impact of phylogeny on the inference of functional sectors from protein sequence data
Statistical analysis of multiple sequence alignments of homologous proteins has revealed groups of coevolving amino acids called sectors. These groups of amino-acid sites feature collective correlations in their amino-acid usage, and they are associated to functional properties. Modeling showed that...
Gespeichert in:
Veröffentlicht in: | PLoS computational biology 2024-09, Vol.20 (9), p.e1012091 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | 9 |
container_start_page | e1012091 |
container_title | PLoS computational biology |
container_volume | 20 |
creator | Dietler, Nicola Abbara, Alia Choudhury, Subham Bitbol, Anne-Florence |
description | Statistical analysis of multiple sequence alignments of homologous proteins has revealed groups of coevolving amino acids called sectors. These groups of amino-acid sites feature collective correlations in their amino-acid usage, and they are associated to functional properties. Modeling showed that nonlinear selection on an additive functional trait of a protein is generically expected to give rise to a functional sector. These modeling results motivated a principled method, called ICOD, which is designed to identify functional sectors, as well as mutational effects, from sequence data. However, a challenge for all methods aiming to identify sectors from multiple sequence alignments is that correlations in amino-acid usage can also arise from the mere fact that homologous sequences share common ancestry, i.e. from phylogeny. Here, we generate controlled synthetic data from a minimal model comprising both phylogeny and functional sectors. We use this data to dissect the impact of phylogeny on sector identification and on mutational effect inference by different methods. We find that ICOD is most robust to phylogeny, but that conservation is also quite robust. Next, we consider natural multiple sequence alignments of protein families for which deep mutational scan experimental data is available. We show that in this natural data, conservation and ICOD best identify sites with strong functional roles, in agreement with our results on synthetic data. Importantly, these two methods have different premises, since they respectively focus on conservation and on correlations. Thus, their joint use can reveal complementary information. |
doi_str_mv | 10.1371/journal.pcbi.1012091 |
format | Article |
fullrecord | <record><control><sourceid>gale_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_11449291</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A811451952</galeid><sourcerecordid>A811451952</sourcerecordid><originalsourceid>FETCH-LOGICAL-c423t-5a7106b269ef95f93f1bc30e6b51c9d3a58fa6e86a5f18bb08f4a33ee649f33c3</originalsourceid><addsrcrecordid>eNqVkk9rGzEQxUVpaNK036CUhV7ag13NaiWvTiWE_jGEBNrmLLTyyFbYlbaSNtTfvnLshhh6KTpomPm9JzQ8Qt4AnQNbwMe7MEWv-_loOjcHCjWV8IycAedstmC8ff6kPiUvU7qjtJRSvCCnTDKouYQzcrscRm1yFWw1brZ9WKPfVsFXeYOV8xYjeoO7qZ28yS6UF6uEJoeYKhvDUI0xZHS-NH9ND-xKZ_2KnFjdJ3x9uM_J7ZfPPy-_za5uvi4vL65mpqlZnnG9ACq6Wki0klvJLHSGURQdByNXTPPWaoGt0NxC23W0tY1mDFE00jJm2Dn5tPcdp27AlUGfo-7VGN2g41YF7dTxxLuNWod7BdA0spZQHN4fHGIoH0hZDS4Z7HvtMUxJMaDtQjAuZEHf7dG17lGV5YRiaXa4umiLIQfJ60LN_0GVs8LBmeDRutI_Enw4EhQm4--81lNKavnj-3-w18dss2dNDClFtI9rAap2CVKHBKldgtQhQUX29ulKH0V_I8P-AKAwxEY</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3108763569</pqid></control><display><type>article</type><title>Impact of phylogeny on the inference of functional sectors from protein sequence data</title><source>Public Library of Science (PLoS) Journals Open Access</source><source>DOAJ Directory of Open Access Journals</source><source>EZB-FREE-00999 freely available EZB journals</source><source>PubMed Central</source><creator>Dietler, Nicola ; Abbara, Alia ; Choudhury, Subham ; Bitbol, Anne-Florence</creator><contributor>Weigt, Martin</contributor><creatorcontrib>Dietler, Nicola ; Abbara, Alia ; Choudhury, Subham ; Bitbol, Anne-Florence ; Weigt, Martin</creatorcontrib><description>Statistical analysis of multiple sequence alignments of homologous proteins has revealed groups of coevolving amino acids called sectors. These groups of amino-acid sites feature collective correlations in their amino-acid usage, and they are associated to functional properties. Modeling showed that nonlinear selection on an additive functional trait of a protein is generically expected to give rise to a functional sector. These modeling results motivated a principled method, called ICOD, which is designed to identify functional sectors, as well as mutational effects, from sequence data. However, a challenge for all methods aiming to identify sectors from multiple sequence alignments is that correlations in amino-acid usage can also arise from the mere fact that homologous sequences share common ancestry, i.e. from phylogeny. Here, we generate controlled synthetic data from a minimal model comprising both phylogeny and functional sectors. We use this data to dissect the impact of phylogeny on sector identification and on mutational effect inference by different methods. We find that ICOD is most robust to phylogeny, but that conservation is also quite robust. Next, we consider natural multiple sequence alignments of protein families for which deep mutational scan experimental data is available. We show that in this natural data, conservation and ICOD best identify sites with strong functional roles, in agreement with our results on synthetic data. Importantly, these two methods have different premises, since they respectively focus on conservation and on correlations. Thus, their joint use can reveal complementary information.</description><identifier>ISSN: 1553-7358</identifier><identifier>ISSN: 1553-734X</identifier><identifier>EISSN: 1553-7358</identifier><identifier>DOI: 10.1371/journal.pcbi.1012091</identifier><identifier>PMID: 39312591</identifier><language>eng</language><publisher>United States: Public Library of Science</publisher><subject>Amino acid sequence ; Biology and Life Sciences ; Computer and Information Sciences ; Data mining ; Ecology and Environmental Sciences ; Methods ; Phylogeny ; Physical Sciences ; Protein research ; Proteins ; Statistical models ; Structure</subject><ispartof>PLoS computational biology, 2024-09, Vol.20 (9), p.e1012091</ispartof><rights>Copyright: © 2024 Dietler et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.</rights><rights>COPYRIGHT 2024 Public Library of Science</rights><rights>2024 Dietler et al 2024 Dietler et al</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c423t-5a7106b269ef95f93f1bc30e6b51c9d3a58fa6e86a5f18bb08f4a33ee649f33c3</cites><orcidid>0000-0003-1020-494X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC11449291/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC11449291/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,723,776,780,860,881,2915,27901,27902,53766,53768</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/39312591$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><contributor>Weigt, Martin</contributor><creatorcontrib>Dietler, Nicola</creatorcontrib><creatorcontrib>Abbara, Alia</creatorcontrib><creatorcontrib>Choudhury, Subham</creatorcontrib><creatorcontrib>Bitbol, Anne-Florence</creatorcontrib><title>Impact of phylogeny on the inference of functional sectors from protein sequence data</title><title>PLoS computational biology</title><addtitle>PLoS Comput Biol</addtitle><description>Statistical analysis of multiple sequence alignments of homologous proteins has revealed groups of coevolving amino acids called sectors. These groups of amino-acid sites feature collective correlations in their amino-acid usage, and they are associated to functional properties. Modeling showed that nonlinear selection on an additive functional trait of a protein is generically expected to give rise to a functional sector. These modeling results motivated a principled method, called ICOD, which is designed to identify functional sectors, as well as mutational effects, from sequence data. However, a challenge for all methods aiming to identify sectors from multiple sequence alignments is that correlations in amino-acid usage can also arise from the mere fact that homologous sequences share common ancestry, i.e. from phylogeny. Here, we generate controlled synthetic data from a minimal model comprising both phylogeny and functional sectors. We use this data to dissect the impact of phylogeny on sector identification and on mutational effect inference by different methods. We find that ICOD is most robust to phylogeny, but that conservation is also quite robust. Next, we consider natural multiple sequence alignments of protein families for which deep mutational scan experimental data is available. We show that in this natural data, conservation and ICOD best identify sites with strong functional roles, in agreement with our results on synthetic data. Importantly, these two methods have different premises, since they respectively focus on conservation and on correlations. Thus, their joint use can reveal complementary information.</description><subject>Amino acid sequence</subject><subject>Biology and Life Sciences</subject><subject>Computer and Information Sciences</subject><subject>Data mining</subject><subject>Ecology and Environmental Sciences</subject><subject>Methods</subject><subject>Phylogeny</subject><subject>Physical Sciences</subject><subject>Protein research</subject><subject>Proteins</subject><subject>Statistical models</subject><subject>Structure</subject><issn>1553-7358</issn><issn>1553-734X</issn><issn>1553-7358</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNqVkk9rGzEQxUVpaNK036CUhV7ag13NaiWvTiWE_jGEBNrmLLTyyFbYlbaSNtTfvnLshhh6KTpomPm9JzQ8Qt4AnQNbwMe7MEWv-_loOjcHCjWV8IycAedstmC8ff6kPiUvU7qjtJRSvCCnTDKouYQzcrscRm1yFWw1brZ9WKPfVsFXeYOV8xYjeoO7qZ28yS6UF6uEJoeYKhvDUI0xZHS-NH9ND-xKZ_2KnFjdJ3x9uM_J7ZfPPy-_za5uvi4vL65mpqlZnnG9ACq6Wki0klvJLHSGURQdByNXTPPWaoGt0NxC23W0tY1mDFE00jJm2Dn5tPcdp27AlUGfo-7VGN2g41YF7dTxxLuNWod7BdA0spZQHN4fHGIoH0hZDS4Z7HvtMUxJMaDtQjAuZEHf7dG17lGV5YRiaXa4umiLIQfJ60LN_0GVs8LBmeDRutI_Enw4EhQm4--81lNKavnj-3-w18dss2dNDClFtI9rAap2CVKHBKldgtQhQUX29ulKH0V_I8P-AKAwxEY</recordid><startdate>20240923</startdate><enddate>20240923</enddate><creator>Dietler, Nicola</creator><creator>Abbara, Alia</creator><creator>Choudhury, Subham</creator><creator>Bitbol, Anne-Florence</creator><general>Public Library of Science</general><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>ISN</scope><scope>ISR</scope><scope>7X8</scope><scope>5PM</scope><orcidid>https://orcid.org/0000-0003-1020-494X</orcidid></search><sort><creationdate>20240923</creationdate><title>Impact of phylogeny on the inference of functional sectors from protein sequence data</title><author>Dietler, Nicola ; Abbara, Alia ; Choudhury, Subham ; Bitbol, Anne-Florence</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c423t-5a7106b269ef95f93f1bc30e6b51c9d3a58fa6e86a5f18bb08f4a33ee649f33c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Amino acid sequence</topic><topic>Biology and Life Sciences</topic><topic>Computer and Information Sciences</topic><topic>Data mining</topic><topic>Ecology and Environmental Sciences</topic><topic>Methods</topic><topic>Phylogeny</topic><topic>Physical Sciences</topic><topic>Protein research</topic><topic>Proteins</topic><topic>Statistical models</topic><topic>Structure</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Dietler, Nicola</creatorcontrib><creatorcontrib>Abbara, Alia</creatorcontrib><creatorcontrib>Choudhury, Subham</creatorcontrib><creatorcontrib>Bitbol, Anne-Florence</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>Gale In Context: Canada</collection><collection>Gale In Context: Science</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>PLoS computational biology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Dietler, Nicola</au><au>Abbara, Alia</au><au>Choudhury, Subham</au><au>Bitbol, Anne-Florence</au><au>Weigt, Martin</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Impact of phylogeny on the inference of functional sectors from protein sequence data</atitle><jtitle>PLoS computational biology</jtitle><addtitle>PLoS Comput Biol</addtitle><date>2024-09-23</date><risdate>2024</risdate><volume>20</volume><issue>9</issue><spage>e1012091</spage><pages>e1012091-</pages><issn>1553-7358</issn><issn>1553-734X</issn><eissn>1553-7358</eissn><abstract>Statistical analysis of multiple sequence alignments of homologous proteins has revealed groups of coevolving amino acids called sectors. These groups of amino-acid sites feature collective correlations in their amino-acid usage, and they are associated to functional properties. Modeling showed that nonlinear selection on an additive functional trait of a protein is generically expected to give rise to a functional sector. These modeling results motivated a principled method, called ICOD, which is designed to identify functional sectors, as well as mutational effects, from sequence data. However, a challenge for all methods aiming to identify sectors from multiple sequence alignments is that correlations in amino-acid usage can also arise from the mere fact that homologous sequences share common ancestry, i.e. from phylogeny. Here, we generate controlled synthetic data from a minimal model comprising both phylogeny and functional sectors. We use this data to dissect the impact of phylogeny on sector identification and on mutational effect inference by different methods. We find that ICOD is most robust to phylogeny, but that conservation is also quite robust. Next, we consider natural multiple sequence alignments of protein families for which deep mutational scan experimental data is available. We show that in this natural data, conservation and ICOD best identify sites with strong functional roles, in agreement with our results on synthetic data. Importantly, these two methods have different premises, since they respectively focus on conservation and on correlations. Thus, their joint use can reveal complementary information.</abstract><cop>United States</cop><pub>Public Library of Science</pub><pmid>39312591</pmid><doi>10.1371/journal.pcbi.1012091</doi><tpages>e1012091</tpages><orcidid>https://orcid.org/0000-0003-1020-494X</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1553-7358 |
ispartof | PLoS computational biology, 2024-09, Vol.20 (9), p.e1012091 |
issn | 1553-7358 1553-734X 1553-7358 |
language | eng |
recordid | cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_11449291 |
source | Public Library of Science (PLoS) Journals Open Access; DOAJ Directory of Open Access Journals; EZB-FREE-00999 freely available EZB journals; PubMed Central |
subjects | Amino acid sequence Biology and Life Sciences Computer and Information Sciences Data mining Ecology and Environmental Sciences Methods Phylogeny Physical Sciences Protein research Proteins Statistical models Structure |
title | Impact of phylogeny on the inference of functional sectors from protein sequence data |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-29T14%3A07%3A58IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Impact%20of%20phylogeny%20on%20the%20inference%20of%20functional%20sectors%20from%20protein%20sequence%20data&rft.jtitle=PLoS%20computational%20biology&rft.au=Dietler,%20Nicola&rft.date=2024-09-23&rft.volume=20&rft.issue=9&rft.spage=e1012091&rft.pages=e1012091-&rft.issn=1553-7358&rft.eissn=1553-7358&rft_id=info:doi/10.1371/journal.pcbi.1012091&rft_dat=%3Cgale_pubme%3EA811451952%3C/gale_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3108763569&rft_id=info:pmid/39312591&rft_galeid=A811451952&rfr_iscdi=true |