Impact of phylogeny on the inference of functional sectors from protein sequence data

Statistical analysis of multiple sequence alignments of homologous proteins has revealed groups of coevolving amino acids called sectors. These groups of amino-acid sites feature collective correlations in their amino-acid usage, and they are associated to functional properties. Modeling showed that...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:PLoS computational biology 2024-09, Vol.20 (9), p.e1012091
Hauptverfasser: Dietler, Nicola, Abbara, Alia, Choudhury, Subham, Bitbol, Anne-Florence
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue 9
container_start_page e1012091
container_title PLoS computational biology
container_volume 20
creator Dietler, Nicola
Abbara, Alia
Choudhury, Subham
Bitbol, Anne-Florence
description Statistical analysis of multiple sequence alignments of homologous proteins has revealed groups of coevolving amino acids called sectors. These groups of amino-acid sites feature collective correlations in their amino-acid usage, and they are associated to functional properties. Modeling showed that nonlinear selection on an additive functional trait of a protein is generically expected to give rise to a functional sector. These modeling results motivated a principled method, called ICOD, which is designed to identify functional sectors, as well as mutational effects, from sequence data. However, a challenge for all methods aiming to identify sectors from multiple sequence alignments is that correlations in amino-acid usage can also arise from the mere fact that homologous sequences share common ancestry, i.e. from phylogeny. Here, we generate controlled synthetic data from a minimal model comprising both phylogeny and functional sectors. We use this data to dissect the impact of phylogeny on sector identification and on mutational effect inference by different methods. We find that ICOD is most robust to phylogeny, but that conservation is also quite robust. Next, we consider natural multiple sequence alignments of protein families for which deep mutational scan experimental data is available. We show that in this natural data, conservation and ICOD best identify sites with strong functional roles, in agreement with our results on synthetic data. Importantly, these two methods have different premises, since they respectively focus on conservation and on correlations. Thus, their joint use can reveal complementary information.
doi_str_mv 10.1371/journal.pcbi.1012091
format Article
fullrecord <record><control><sourceid>gale_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_11449291</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A811451952</galeid><sourcerecordid>A811451952</sourcerecordid><originalsourceid>FETCH-LOGICAL-c423t-5a7106b269ef95f93f1bc30e6b51c9d3a58fa6e86a5f18bb08f4a33ee649f33c3</originalsourceid><addsrcrecordid>eNqVkk9rGzEQxUVpaNK036CUhV7ag13NaiWvTiWE_jGEBNrmLLTyyFbYlbaSNtTfvnLshhh6KTpomPm9JzQ8Qt4AnQNbwMe7MEWv-_loOjcHCjWV8IycAedstmC8ff6kPiUvU7qjtJRSvCCnTDKouYQzcrscRm1yFWw1brZ9WKPfVsFXeYOV8xYjeoO7qZ28yS6UF6uEJoeYKhvDUI0xZHS-NH9ND-xKZ_2KnFjdJ3x9uM_J7ZfPPy-_za5uvi4vL65mpqlZnnG9ACq6Wki0klvJLHSGURQdByNXTPPWaoGt0NxC23W0tY1mDFE00jJm2Dn5tPcdp27AlUGfo-7VGN2g41YF7dTxxLuNWod7BdA0spZQHN4fHGIoH0hZDS4Z7HvtMUxJMaDtQjAuZEHf7dG17lGV5YRiaXa4umiLIQfJ60LN_0GVs8LBmeDRutI_Enw4EhQm4--81lNKavnj-3-w18dss2dNDClFtI9rAap2CVKHBKldgtQhQUX29ulKH0V_I8P-AKAwxEY</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3108763569</pqid></control><display><type>article</type><title>Impact of phylogeny on the inference of functional sectors from protein sequence data</title><source>Public Library of Science (PLoS) Journals Open Access</source><source>DOAJ Directory of Open Access Journals</source><source>EZB-FREE-00999 freely available EZB journals</source><source>PubMed Central</source><creator>Dietler, Nicola ; Abbara, Alia ; Choudhury, Subham ; Bitbol, Anne-Florence</creator><contributor>Weigt, Martin</contributor><creatorcontrib>Dietler, Nicola ; Abbara, Alia ; Choudhury, Subham ; Bitbol, Anne-Florence ; Weigt, Martin</creatorcontrib><description>Statistical analysis of multiple sequence alignments of homologous proteins has revealed groups of coevolving amino acids called sectors. These groups of amino-acid sites feature collective correlations in their amino-acid usage, and they are associated to functional properties. Modeling showed that nonlinear selection on an additive functional trait of a protein is generically expected to give rise to a functional sector. These modeling results motivated a principled method, called ICOD, which is designed to identify functional sectors, as well as mutational effects, from sequence data. However, a challenge for all methods aiming to identify sectors from multiple sequence alignments is that correlations in amino-acid usage can also arise from the mere fact that homologous sequences share common ancestry, i.e. from phylogeny. Here, we generate controlled synthetic data from a minimal model comprising both phylogeny and functional sectors. We use this data to dissect the impact of phylogeny on sector identification and on mutational effect inference by different methods. We find that ICOD is most robust to phylogeny, but that conservation is also quite robust. Next, we consider natural multiple sequence alignments of protein families for which deep mutational scan experimental data is available. We show that in this natural data, conservation and ICOD best identify sites with strong functional roles, in agreement with our results on synthetic data. Importantly, these two methods have different premises, since they respectively focus on conservation and on correlations. Thus, their joint use can reveal complementary information.</description><identifier>ISSN: 1553-7358</identifier><identifier>ISSN: 1553-734X</identifier><identifier>EISSN: 1553-7358</identifier><identifier>DOI: 10.1371/journal.pcbi.1012091</identifier><identifier>PMID: 39312591</identifier><language>eng</language><publisher>United States: Public Library of Science</publisher><subject>Amino acid sequence ; Biology and Life Sciences ; Computer and Information Sciences ; Data mining ; Ecology and Environmental Sciences ; Methods ; Phylogeny ; Physical Sciences ; Protein research ; Proteins ; Statistical models ; Structure</subject><ispartof>PLoS computational biology, 2024-09, Vol.20 (9), p.e1012091</ispartof><rights>Copyright: © 2024 Dietler et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.</rights><rights>COPYRIGHT 2024 Public Library of Science</rights><rights>2024 Dietler et al 2024 Dietler et al</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c423t-5a7106b269ef95f93f1bc30e6b51c9d3a58fa6e86a5f18bb08f4a33ee649f33c3</cites><orcidid>0000-0003-1020-494X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC11449291/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC11449291/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,723,776,780,860,881,2915,27901,27902,53766,53768</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/39312591$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><contributor>Weigt, Martin</contributor><creatorcontrib>Dietler, Nicola</creatorcontrib><creatorcontrib>Abbara, Alia</creatorcontrib><creatorcontrib>Choudhury, Subham</creatorcontrib><creatorcontrib>Bitbol, Anne-Florence</creatorcontrib><title>Impact of phylogeny on the inference of functional sectors from protein sequence data</title><title>PLoS computational biology</title><addtitle>PLoS Comput Biol</addtitle><description>Statistical analysis of multiple sequence alignments of homologous proteins has revealed groups of coevolving amino acids called sectors. These groups of amino-acid sites feature collective correlations in their amino-acid usage, and they are associated to functional properties. Modeling showed that nonlinear selection on an additive functional trait of a protein is generically expected to give rise to a functional sector. These modeling results motivated a principled method, called ICOD, which is designed to identify functional sectors, as well as mutational effects, from sequence data. However, a challenge for all methods aiming to identify sectors from multiple sequence alignments is that correlations in amino-acid usage can also arise from the mere fact that homologous sequences share common ancestry, i.e. from phylogeny. Here, we generate controlled synthetic data from a minimal model comprising both phylogeny and functional sectors. We use this data to dissect the impact of phylogeny on sector identification and on mutational effect inference by different methods. We find that ICOD is most robust to phylogeny, but that conservation is also quite robust. Next, we consider natural multiple sequence alignments of protein families for which deep mutational scan experimental data is available. We show that in this natural data, conservation and ICOD best identify sites with strong functional roles, in agreement with our results on synthetic data. Importantly, these two methods have different premises, since they respectively focus on conservation and on correlations. Thus, their joint use can reveal complementary information.</description><subject>Amino acid sequence</subject><subject>Biology and Life Sciences</subject><subject>Computer and Information Sciences</subject><subject>Data mining</subject><subject>Ecology and Environmental Sciences</subject><subject>Methods</subject><subject>Phylogeny</subject><subject>Physical Sciences</subject><subject>Protein research</subject><subject>Proteins</subject><subject>Statistical models</subject><subject>Structure</subject><issn>1553-7358</issn><issn>1553-734X</issn><issn>1553-7358</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNqVkk9rGzEQxUVpaNK036CUhV7ag13NaiWvTiWE_jGEBNrmLLTyyFbYlbaSNtTfvnLshhh6KTpomPm9JzQ8Qt4AnQNbwMe7MEWv-_loOjcHCjWV8IycAedstmC8ff6kPiUvU7qjtJRSvCCnTDKouYQzcrscRm1yFWw1brZ9WKPfVsFXeYOV8xYjeoO7qZ28yS6UF6uEJoeYKhvDUI0xZHS-NH9ND-xKZ_2KnFjdJ3x9uM_J7ZfPPy-_za5uvi4vL65mpqlZnnG9ACq6Wki0klvJLHSGURQdByNXTPPWaoGt0NxC23W0tY1mDFE00jJm2Dn5tPcdp27AlUGfo-7VGN2g41YF7dTxxLuNWod7BdA0spZQHN4fHGIoH0hZDS4Z7HvtMUxJMaDtQjAuZEHf7dG17lGV5YRiaXa4umiLIQfJ60LN_0GVs8LBmeDRutI_Enw4EhQm4--81lNKavnj-3-w18dss2dNDClFtI9rAap2CVKHBKldgtQhQUX29ulKH0V_I8P-AKAwxEY</recordid><startdate>20240923</startdate><enddate>20240923</enddate><creator>Dietler, Nicola</creator><creator>Abbara, Alia</creator><creator>Choudhury, Subham</creator><creator>Bitbol, Anne-Florence</creator><general>Public Library of Science</general><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>ISN</scope><scope>ISR</scope><scope>7X8</scope><scope>5PM</scope><orcidid>https://orcid.org/0000-0003-1020-494X</orcidid></search><sort><creationdate>20240923</creationdate><title>Impact of phylogeny on the inference of functional sectors from protein sequence data</title><author>Dietler, Nicola ; Abbara, Alia ; Choudhury, Subham ; Bitbol, Anne-Florence</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c423t-5a7106b269ef95f93f1bc30e6b51c9d3a58fa6e86a5f18bb08f4a33ee649f33c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Amino acid sequence</topic><topic>Biology and Life Sciences</topic><topic>Computer and Information Sciences</topic><topic>Data mining</topic><topic>Ecology and Environmental Sciences</topic><topic>Methods</topic><topic>Phylogeny</topic><topic>Physical Sciences</topic><topic>Protein research</topic><topic>Proteins</topic><topic>Statistical models</topic><topic>Structure</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Dietler, Nicola</creatorcontrib><creatorcontrib>Abbara, Alia</creatorcontrib><creatorcontrib>Choudhury, Subham</creatorcontrib><creatorcontrib>Bitbol, Anne-Florence</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>Gale In Context: Canada</collection><collection>Gale In Context: Science</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>PLoS computational biology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Dietler, Nicola</au><au>Abbara, Alia</au><au>Choudhury, Subham</au><au>Bitbol, Anne-Florence</au><au>Weigt, Martin</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Impact of phylogeny on the inference of functional sectors from protein sequence data</atitle><jtitle>PLoS computational biology</jtitle><addtitle>PLoS Comput Biol</addtitle><date>2024-09-23</date><risdate>2024</risdate><volume>20</volume><issue>9</issue><spage>e1012091</spage><pages>e1012091-</pages><issn>1553-7358</issn><issn>1553-734X</issn><eissn>1553-7358</eissn><abstract>Statistical analysis of multiple sequence alignments of homologous proteins has revealed groups of coevolving amino acids called sectors. These groups of amino-acid sites feature collective correlations in their amino-acid usage, and they are associated to functional properties. Modeling showed that nonlinear selection on an additive functional trait of a protein is generically expected to give rise to a functional sector. These modeling results motivated a principled method, called ICOD, which is designed to identify functional sectors, as well as mutational effects, from sequence data. However, a challenge for all methods aiming to identify sectors from multiple sequence alignments is that correlations in amino-acid usage can also arise from the mere fact that homologous sequences share common ancestry, i.e. from phylogeny. Here, we generate controlled synthetic data from a minimal model comprising both phylogeny and functional sectors. We use this data to dissect the impact of phylogeny on sector identification and on mutational effect inference by different methods. We find that ICOD is most robust to phylogeny, but that conservation is also quite robust. Next, we consider natural multiple sequence alignments of protein families for which deep mutational scan experimental data is available. We show that in this natural data, conservation and ICOD best identify sites with strong functional roles, in agreement with our results on synthetic data. Importantly, these two methods have different premises, since they respectively focus on conservation and on correlations. Thus, their joint use can reveal complementary information.</abstract><cop>United States</cop><pub>Public Library of Science</pub><pmid>39312591</pmid><doi>10.1371/journal.pcbi.1012091</doi><tpages>e1012091</tpages><orcidid>https://orcid.org/0000-0003-1020-494X</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1553-7358
ispartof PLoS computational biology, 2024-09, Vol.20 (9), p.e1012091
issn 1553-7358
1553-734X
1553-7358
language eng
recordid cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_11449291
source Public Library of Science (PLoS) Journals Open Access; DOAJ Directory of Open Access Journals; EZB-FREE-00999 freely available EZB journals; PubMed Central
subjects Amino acid sequence
Biology and Life Sciences
Computer and Information Sciences
Data mining
Ecology and Environmental Sciences
Methods
Phylogeny
Physical Sciences
Protein research
Proteins
Statistical models
Structure
title Impact of phylogeny on the inference of functional sectors from protein sequence data
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-29T14%3A07%3A58IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Impact%20of%20phylogeny%20on%20the%20inference%20of%20functional%20sectors%20from%20protein%20sequence%20data&rft.jtitle=PLoS%20computational%20biology&rft.au=Dietler,%20Nicola&rft.date=2024-09-23&rft.volume=20&rft.issue=9&rft.spage=e1012091&rft.pages=e1012091-&rft.issn=1553-7358&rft.eissn=1553-7358&rft_id=info:doi/10.1371/journal.pcbi.1012091&rft_dat=%3Cgale_pubme%3EA811451952%3C/gale_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3108763569&rft_id=info:pmid/39312591&rft_galeid=A811451952&rfr_iscdi=true