Site-Specific Amino Acid Distributions Follow a Universal Shape
In many applications of evolutionary inference, a model of protein evolution needs to be fitted to the amino acid variation at individual sites in a multiple sequence alignment. Most existing models fall into one of two extremes: Either they provide a coarse-grained description that lacks biophysica...
Gespeichert in:
Veröffentlicht in: | Journal of molecular evolution 2020-12, Vol.88 (10), p.731-741 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 741 |
---|---|
container_issue | 10 |
container_start_page | 731 |
container_title | Journal of molecular evolution |
container_volume | 88 |
creator | Johnson, Mackenzie M. Wilke, Claus O. |
description | In many applications of evolutionary inference, a model of protein evolution needs to be fitted to the amino acid variation at individual sites in a multiple sequence alignment. Most existing models fall into one of two extremes: Either they provide a coarse-grained description that lacks biophysical realism (e.g.,
dN
/
dS
models), or they require a large number of parameters to be fitted (e.g., mutation–selection models). Here, we ask whether a middle ground is possible: Can we obtain a realistic description of site-specific amino acid frequencies while severely restricting the number of free parameters in the model? We show that a distribution with a single free parameter can accurately capture the variation in amino acid frequency at most sites in an alignment, as long as we are willing to restrict our analysis to predicting amino acid frequencies by rank rather than by amino acid identity. This result holds equally well both in alignments of empirical protein sequences and of sequences evolved under a biophysically realistic all-atom force field. Our analysis reveals a near universal shape of the frequency distributions of amino acids. This insight has the potential to lead to new models of evolution that have both increased realism and a limited number of free parameters. |
doi_str_mv | 10.1007/s00239-020-09976-8 |
format | Article |
fullrecord | <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_7717668</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2464148446</sourcerecordid><originalsourceid>FETCH-LOGICAL-c474t-4689572ff2732e29664a74e5417297356c4138a884a608cad17dda8be6a434d3</originalsourceid><addsrcrecordid>eNp9kTtPwzAUhS0EoqXwBxhQJBYWg1-1nQVUFQpISAyF2XIdB1ylcbGTIv49LoHyGFjuHc7nc-_1AeAQo1OMkDiLCBGaQ0QQRHkuOJRboI8ZJXBdtkE_6QQSyVgP7MU4RwiLYU53QY9SQhHnrA8upq6xcLq0xpXOZKOFq302Mq7ILl1sgpu1jfN1zCa-qvxrprPH2q1siLrKps96affBTqmraA8--wA8TK4exjfw7v76djy6g4YJ1kDGZT4UpCyJoMSSPM3Wgtkhw4Lkgg65YZhKLSXTHEmjCyyKQsuZ5ZpRVtABOO9sl-1sYQtj6yboSi2DW-jwprx26rdSu2f15FdKCCw4l8ng5NMg-JfWxkYtXDS2qnRtfRsVYZxhlr6KJ_T4Dzr3bajTdYkSlOYSc5oo0lEm-BiDLTfLYKTW8aguHpXiUR_xqPUWRz_P2Dz5yiMBtANikuonG75n_2P7Dg4LmWQ</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2473398163</pqid></control><display><type>article</type><title>Site-Specific Amino Acid Distributions Follow a Universal Shape</title><source>MEDLINE</source><source>SpringerLink_现刊</source><creator>Johnson, Mackenzie M. ; Wilke, Claus O.</creator><creatorcontrib>Johnson, Mackenzie M. ; Wilke, Claus O.</creatorcontrib><description>In many applications of evolutionary inference, a model of protein evolution needs to be fitted to the amino acid variation at individual sites in a multiple sequence alignment. Most existing models fall into one of two extremes: Either they provide a coarse-grained description that lacks biophysical realism (e.g.,
dN
/
dS
models), or they require a large number of parameters to be fitted (e.g., mutation–selection models). Here, we ask whether a middle ground is possible: Can we obtain a realistic description of site-specific amino acid frequencies while severely restricting the number of free parameters in the model? We show that a distribution with a single free parameter can accurately capture the variation in amino acid frequency at most sites in an alignment, as long as we are willing to restrict our analysis to predicting amino acid frequencies by rank rather than by amino acid identity. This result holds equally well both in alignments of empirical protein sequences and of sequences evolved under a biophysically realistic all-atom force field. Our analysis reveals a near universal shape of the frequency distributions of amino acids. This insight has the potential to lead to new models of evolution that have both increased realism and a limited number of free parameters.</description><identifier>ISSN: 0022-2844</identifier><identifier>EISSN: 1432-1432</identifier><identifier>DOI: 10.1007/s00239-020-09976-8</identifier><identifier>PMID: 33230664</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Alignment ; Amino Acid Sequence ; Amino Acid Substitution ; Amino acids ; Amino Acids - genetics ; Animal Genetics and Genomics ; Biomedical and Life Sciences ; Cell Biology ; Empirical analysis ; Evolution ; Evolution, Molecular ; Evolutionary Biology ; Life Sciences ; Mathematical models ; Microbiology ; Models, Genetic ; Mutation ; Nucleotide sequence ; Original Article ; Parameters ; Plant Genetics and Genomics ; Plant Sciences ; Proteins ; Realism ; Sequence Alignment</subject><ispartof>Journal of molecular evolution, 2020-12, Vol.88 (10), p.731-741</ispartof><rights>Springer Science+Business Media, LLC, part of Springer Nature 2020</rights><rights>Springer Science+Business Media, LLC, part of Springer Nature 2020.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c474t-4689572ff2732e29664a74e5417297356c4138a884a608cad17dda8be6a434d3</citedby><cites>FETCH-LOGICAL-c474t-4689572ff2732e29664a74e5417297356c4138a884a608cad17dda8be6a434d3</cites><orcidid>0000-0002-3915-2023 ; 0000-0002-7470-9261</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s00239-020-09976-8$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s00239-020-09976-8$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>230,314,780,784,885,27924,27925,41488,42557,51319</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/33230664$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Johnson, Mackenzie M.</creatorcontrib><creatorcontrib>Wilke, Claus O.</creatorcontrib><title>Site-Specific Amino Acid Distributions Follow a Universal Shape</title><title>Journal of molecular evolution</title><addtitle>J Mol Evol</addtitle><addtitle>J Mol Evol</addtitle><description>In many applications of evolutionary inference, a model of protein evolution needs to be fitted to the amino acid variation at individual sites in a multiple sequence alignment. Most existing models fall into one of two extremes: Either they provide a coarse-grained description that lacks biophysical realism (e.g.,
dN
/
dS
models), or they require a large number of parameters to be fitted (e.g., mutation–selection models). Here, we ask whether a middle ground is possible: Can we obtain a realistic description of site-specific amino acid frequencies while severely restricting the number of free parameters in the model? We show that a distribution with a single free parameter can accurately capture the variation in amino acid frequency at most sites in an alignment, as long as we are willing to restrict our analysis to predicting amino acid frequencies by rank rather than by amino acid identity. This result holds equally well both in alignments of empirical protein sequences and of sequences evolved under a biophysically realistic all-atom force field. Our analysis reveals a near universal shape of the frequency distributions of amino acids. This insight has the potential to lead to new models of evolution that have both increased realism and a limited number of free parameters.</description><subject>Alignment</subject><subject>Amino Acid Sequence</subject><subject>Amino Acid Substitution</subject><subject>Amino acids</subject><subject>Amino Acids - genetics</subject><subject>Animal Genetics and Genomics</subject><subject>Biomedical and Life Sciences</subject><subject>Cell Biology</subject><subject>Empirical analysis</subject><subject>Evolution</subject><subject>Evolution, Molecular</subject><subject>Evolutionary Biology</subject><subject>Life Sciences</subject><subject>Mathematical models</subject><subject>Microbiology</subject><subject>Models, Genetic</subject><subject>Mutation</subject><subject>Nucleotide sequence</subject><subject>Original Article</subject><subject>Parameters</subject><subject>Plant Genetics and Genomics</subject><subject>Plant Sciences</subject><subject>Proteins</subject><subject>Realism</subject><subject>Sequence Alignment</subject><issn>0022-2844</issn><issn>1432-1432</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><sourceid>8G5</sourceid><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><sourceid>GUQSH</sourceid><sourceid>M2O</sourceid><recordid>eNp9kTtPwzAUhS0EoqXwBxhQJBYWg1-1nQVUFQpISAyF2XIdB1ylcbGTIv49LoHyGFjuHc7nc-_1AeAQo1OMkDiLCBGaQ0QQRHkuOJRboI8ZJXBdtkE_6QQSyVgP7MU4RwiLYU53QY9SQhHnrA8upq6xcLq0xpXOZKOFq302Mq7ILl1sgpu1jfN1zCa-qvxrprPH2q1siLrKps96affBTqmraA8--wA8TK4exjfw7v76djy6g4YJ1kDGZT4UpCyJoMSSPM3Wgtkhw4Lkgg65YZhKLSXTHEmjCyyKQsuZ5ZpRVtABOO9sl-1sYQtj6yboSi2DW-jwprx26rdSu2f15FdKCCw4l8ng5NMg-JfWxkYtXDS2qnRtfRsVYZxhlr6KJ_T4Dzr3bajTdYkSlOYSc5oo0lEm-BiDLTfLYKTW8aguHpXiUR_xqPUWRz_P2Dz5yiMBtANikuonG75n_2P7Dg4LmWQ</recordid><startdate>20201201</startdate><enddate>20201201</enddate><creator>Johnson, Mackenzie M.</creator><creator>Wilke, Claus O.</creator><general>Springer US</general><general>Springer Nature B.V</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7QL</scope><scope>7QP</scope><scope>7QR</scope><scope>7T7</scope><scope>7TK</scope><scope>7U9</scope><scope>7X7</scope><scope>7XB</scope><scope>88A</scope><scope>88E</scope><scope>8AO</scope><scope>8FD</scope><scope>8FE</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>8G5</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BHPHI</scope><scope>C1K</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FR3</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>GUQSH</scope><scope>H94</scope><scope>HCIFZ</scope><scope>K9.</scope><scope>LK8</scope><scope>M0S</scope><scope>M1P</scope><scope>M2O</scope><scope>M7N</scope><scope>M7P</scope><scope>MBDVC</scope><scope>P64</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>Q9U</scope><scope>RC3</scope><scope>7X8</scope><scope>5PM</scope><orcidid>https://orcid.org/0000-0002-3915-2023</orcidid><orcidid>https://orcid.org/0000-0002-7470-9261</orcidid></search><sort><creationdate>20201201</creationdate><title>Site-Specific Amino Acid Distributions Follow a Universal Shape</title><author>Johnson, Mackenzie M. ; Wilke, Claus O.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c474t-4689572ff2732e29664a74e5417297356c4138a884a608cad17dda8be6a434d3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Alignment</topic><topic>Amino Acid Sequence</topic><topic>Amino Acid Substitution</topic><topic>Amino acids</topic><topic>Amino Acids - genetics</topic><topic>Animal Genetics and Genomics</topic><topic>Biomedical and Life Sciences</topic><topic>Cell Biology</topic><topic>Empirical analysis</topic><topic>Evolution</topic><topic>Evolution, Molecular</topic><topic>Evolutionary Biology</topic><topic>Life Sciences</topic><topic>Mathematical models</topic><topic>Microbiology</topic><topic>Models, Genetic</topic><topic>Mutation</topic><topic>Nucleotide sequence</topic><topic>Original Article</topic><topic>Parameters</topic><topic>Plant Genetics and Genomics</topic><topic>Plant Sciences</topic><topic>Proteins</topic><topic>Realism</topic><topic>Sequence Alignment</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Johnson, Mackenzie M.</creatorcontrib><creatorcontrib>Wilke, Claus O.</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Bacteriology Abstracts (Microbiology B)</collection><collection>Calcium & Calcified Tissue Abstracts</collection><collection>Chemoreception Abstracts</collection><collection>Industrial and Applied Microbiology Abstracts (Microbiology A)</collection><collection>Neurosciences Abstracts</collection><collection>Virology and AIDS Abstracts</collection><collection>ProQuest_Health & Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Biology Database (Alumni Edition)</collection><collection>Medical Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>Research Library (Alumni Edition)</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>AUTh Library subscriptions: ProQuest Central</collection><collection>ProQuest Natural Science Collection</collection><collection>Environmental Sciences and Pollution Management</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Engineering Research Database</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>Research Library Prep</collection><collection>AIDS and Cancer Research Abstracts</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Health & Medical Complete (Alumni)</collection><collection>ProQuest Biological Science Collection</collection><collection>Health & Medical Collection (Alumni Edition)</collection><collection>PML(ProQuest Medical Library)</collection><collection>ProQuest_Research Library</collection><collection>Algology Mycology and Protozoology Abstracts (Microbiology C)</collection><collection>ProQuest Biological Science Journals</collection><collection>Research Library (Corporate)</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central Basic</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Journal of molecular evolution</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Johnson, Mackenzie M.</au><au>Wilke, Claus O.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Site-Specific Amino Acid Distributions Follow a Universal Shape</atitle><jtitle>Journal of molecular evolution</jtitle><stitle>J Mol Evol</stitle><addtitle>J Mol Evol</addtitle><date>2020-12-01</date><risdate>2020</risdate><volume>88</volume><issue>10</issue><spage>731</spage><epage>741</epage><pages>731-741</pages><issn>0022-2844</issn><eissn>1432-1432</eissn><abstract>In many applications of evolutionary inference, a model of protein evolution needs to be fitted to the amino acid variation at individual sites in a multiple sequence alignment. Most existing models fall into one of two extremes: Either they provide a coarse-grained description that lacks biophysical realism (e.g.,
dN
/
dS
models), or they require a large number of parameters to be fitted (e.g., mutation–selection models). Here, we ask whether a middle ground is possible: Can we obtain a realistic description of site-specific amino acid frequencies while severely restricting the number of free parameters in the model? We show that a distribution with a single free parameter can accurately capture the variation in amino acid frequency at most sites in an alignment, as long as we are willing to restrict our analysis to predicting amino acid frequencies by rank rather than by amino acid identity. This result holds equally well both in alignments of empirical protein sequences and of sequences evolved under a biophysically realistic all-atom force field. Our analysis reveals a near universal shape of the frequency distributions of amino acids. This insight has the potential to lead to new models of evolution that have both increased realism and a limited number of free parameters.</abstract><cop>New York</cop><pub>Springer US</pub><pmid>33230664</pmid><doi>10.1007/s00239-020-09976-8</doi><tpages>11</tpages><orcidid>https://orcid.org/0000-0002-3915-2023</orcidid><orcidid>https://orcid.org/0000-0002-7470-9261</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0022-2844 |
ispartof | Journal of molecular evolution, 2020-12, Vol.88 (10), p.731-741 |
issn | 0022-2844 1432-1432 |
language | eng |
recordid | cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_7717668 |
source | MEDLINE; SpringerLink_现刊 |
subjects | Alignment Amino Acid Sequence Amino Acid Substitution Amino acids Amino Acids - genetics Animal Genetics and Genomics Biomedical and Life Sciences Cell Biology Empirical analysis Evolution Evolution, Molecular Evolutionary Biology Life Sciences Mathematical models Microbiology Models, Genetic Mutation Nucleotide sequence Original Article Parameters Plant Genetics and Genomics Plant Sciences Proteins Realism Sequence Alignment |
title | Site-Specific Amino Acid Distributions Follow a Universal Shape |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-25T10%3A41%3A18IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Site-Specific%20Amino%20Acid%20Distributions%20Follow%20a%20Universal%20Shape&rft.jtitle=Journal%20of%20molecular%20evolution&rft.au=Johnson,%20Mackenzie%20M.&rft.date=2020-12-01&rft.volume=88&rft.issue=10&rft.spage=731&rft.epage=741&rft.pages=731-741&rft.issn=0022-2844&rft.eissn=1432-1432&rft_id=info:doi/10.1007/s00239-020-09976-8&rft_dat=%3Cproquest_pubme%3E2464148446%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2473398163&rft_id=info:pmid/33230664&rfr_iscdi=true |