A Nonparametric Method for Accommodating and Testing Across-Site Rate Variation

Substitution rates are one of the most fundamental parameters in a phylogenetic analysis and are represented in phylogenetic models as the branch lengths on a tree. Variation in substitution rates across an alignment of molecular sequences is well established and likely caused by variation in functi...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Systematic biology 2007-12, Vol.56 (6), p.975-987
Hauptverfasser: Huelsenbeck, John P., Suchard, Marc A., Buckley, Thomas
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 987
container_issue 6
container_start_page 975
container_title Systematic biology
container_volume 56
creator Huelsenbeck, John P.
Suchard, Marc A.
Buckley, Thomas
description Substitution rates are one of the most fundamental parameters in a phylogenetic analysis and are represented in phylogenetic models as the branch lengths on a tree. Variation in substitution rates across an alignment of molecular sequences is well established and likely caused by variation in functional constraint across the genes encoded in the sequences. Rate variation across alignment sites is important to accommodate in a phylogenetic analysis; failure to account for across-site rate variation can cause biased estimates of phylogeny or other model parameters. Traditionally, rate variation across sites has been modeled by treating the rate for a site as a random variable drawn from some probability distribution (such as the gamma probability distribution) or by partitioning sites to different rate classes and estimating the rate for each class independently. We consider a different approach, related to site-specific models in which sites are partitioned to rate classes. However, instead of treating the partitioning scheme in which sites are assigned to rate classes as a fixed assumption of the analysis, we treat the rate partitioning as a random variable under a Dirichlet process prior. We find that the Dirichlet process prior model for across-site rate variation fits alignments of DNA sequence data better than commonly used models of across-site rate variation. The method appears to identify the underlying codon structure of protein-coding genes; rate partitions that were sampled by the Markov chain Monte Carlo procedure were closer to a partition in which sites are assigned to rate classes by codon position than to randomly permuted partitions but still allow for additional variability across sites.
doi_str_mv 10.1080/10635150701670569
format Article
fullrecord <record><control><sourceid>jstor_proqu</sourceid><recordid>TN_cdi_proquest_miscellaneous_69056617</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><jstor_id>20143106</jstor_id><oup_id>10.1080/10635150701670569</oup_id><sourcerecordid>20143106</sourcerecordid><originalsourceid>FETCH-LOGICAL-c532t-81250142ea7bc5f873cb0d9df275b44783cca25a56dc61321a3243d653c918f63</originalsourceid><addsrcrecordid>eNqNkMtOwzAQRS0E4lH4ABagiAUrAh47tpNlVfFUoRIvoW4s13EgpYmLnUjw97ikAgk2bGZGmnOvxxehXcDHgFN8AphTBgwLDFxgxrMVtAlY8Dil_Gl1MXMaB0BsoC3vpxgDcAbraANSLFhGk0006kc3tp4rpyrTuFJH16Z5sXlUWBf1tbZVZXPVlPVzpOo8ujf-a-5rZ72P78rGRLcqlEflyoDZehutFWrmzc6y99DD2en94CIejs4vB_1hrBklTZwCYRgSYpSYaFakguoJzrO8IIJNkkSkVGtFmGI81xwoAUVJQnPOqM4gLTjtocPOd-7sWxvOklXptZnNVG1s6yXPQh4cRAAPfoFT27o63CYhCw8BAAkQdNDXv5wp5NyVlXIfErBcRC3_RB00-0vjdlKZ_EexzDYARx1g2_m__PY6fOob674FJMREF3QPxd2-9I15_94r9yq5oILJi6exJNf45mp8S-SYfgJlgJwM</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>194781112</pqid></control><display><type>article</type><title>A Nonparametric Method for Accommodating and Testing Across-Site Rate Variation</title><source>Jstor Complete Legacy</source><source>Oxford University Press Journals All Titles (1996-Current)</source><source>MEDLINE</source><creator>Huelsenbeck, John P. ; Suchard, Marc A. ; Buckley, Thomas</creator><contributor>Buckley, Thomas</contributor><creatorcontrib>Huelsenbeck, John P. ; Suchard, Marc A. ; Buckley, Thomas ; Buckley, Thomas</creatorcontrib><description>Substitution rates are one of the most fundamental parameters in a phylogenetic analysis and are represented in phylogenetic models as the branch lengths on a tree. Variation in substitution rates across an alignment of molecular sequences is well established and likely caused by variation in functional constraint across the genes encoded in the sequences. Rate variation across alignment sites is important to accommodate in a phylogenetic analysis; failure to account for across-site rate variation can cause biased estimates of phylogeny or other model parameters. Traditionally, rate variation across sites has been modeled by treating the rate for a site as a random variable drawn from some probability distribution (such as the gamma probability distribution) or by partitioning sites to different rate classes and estimating the rate for each class independently. We consider a different approach, related to site-specific models in which sites are partitioned to rate classes. However, instead of treating the partitioning scheme in which sites are assigned to rate classes as a fixed assumption of the analysis, we treat the rate partitioning as a random variable under a Dirichlet process prior. We find that the Dirichlet process prior model for across-site rate variation fits alignments of DNA sequence data better than commonly used models of across-site rate variation. The method appears to identify the underlying codon structure of protein-coding genes; rate partitions that were sampled by the Markov chain Monte Carlo procedure were closer to a partition in which sites are assigned to rate classes by codon position than to randomly permuted partitions but still allow for additional variability across sites.</description><identifier>ISSN: 1063-5157</identifier><identifier>EISSN: 1076-836X</identifier><identifier>DOI: 10.1080/10635150701670569</identifier><identifier>PMID: 18075934</identifier><language>eng</language><publisher>England: Society of Systematic Zoology</publisher><subject>Across-site rate variation ; Bayesian estimation ; Classification - methods ; Codons ; Deoxyribonucleic acid ; Dirichlet process prior ; DNA ; Genetic variation ; Genetics ; Markov analysis ; Markov chain Monte Carlo ; Markov chains ; Modeling ; Models, Genetic ; Molecular structure ; Nucleotide sequences ; Nucleotides ; Parametric models ; Phylogenetics ; Phylogeny ; Probability distributions ; Random variables ; Statistics, Nonparametric ; Taxonomy</subject><ispartof>Systematic biology, 2007-12, Vol.56 (6), p.975-987</ispartof><rights>Copyright 2007 Society of Systematic Biologists</rights><rights>2007 Society of Systematic Biologists 2007</rights><rights>Copyright Taylor &amp; Francis Ltd. Dec 2007</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c532t-81250142ea7bc5f873cb0d9df275b44783cca25a56dc61321a3243d653c918f63</citedby><cites>FETCH-LOGICAL-c532t-81250142ea7bc5f873cb0d9df275b44783cca25a56dc61321a3243d653c918f63</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.jstor.org/stable/pdf/20143106$$EPDF$$P50$$Gjstor$$H</linktopdf><linktohtml>$$Uhttps://www.jstor.org/stable/20143106$$EHTML$$P50$$Gjstor$$H</linktohtml><link.rule.ids>314,776,780,799,27901,27902,57992,58225</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/18075934$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><contributor>Buckley, Thomas</contributor><creatorcontrib>Huelsenbeck, John P.</creatorcontrib><creatorcontrib>Suchard, Marc A.</creatorcontrib><creatorcontrib>Buckley, Thomas</creatorcontrib><title>A Nonparametric Method for Accommodating and Testing Across-Site Rate Variation</title><title>Systematic biology</title><addtitle>Syst Biol</addtitle><description>Substitution rates are one of the most fundamental parameters in a phylogenetic analysis and are represented in phylogenetic models as the branch lengths on a tree. Variation in substitution rates across an alignment of molecular sequences is well established and likely caused by variation in functional constraint across the genes encoded in the sequences. Rate variation across alignment sites is important to accommodate in a phylogenetic analysis; failure to account for across-site rate variation can cause biased estimates of phylogeny or other model parameters. Traditionally, rate variation across sites has been modeled by treating the rate for a site as a random variable drawn from some probability distribution (such as the gamma probability distribution) or by partitioning sites to different rate classes and estimating the rate for each class independently. We consider a different approach, related to site-specific models in which sites are partitioned to rate classes. However, instead of treating the partitioning scheme in which sites are assigned to rate classes as a fixed assumption of the analysis, we treat the rate partitioning as a random variable under a Dirichlet process prior. We find that the Dirichlet process prior model for across-site rate variation fits alignments of DNA sequence data better than commonly used models of across-site rate variation. The method appears to identify the underlying codon structure of protein-coding genes; rate partitions that were sampled by the Markov chain Monte Carlo procedure were closer to a partition in which sites are assigned to rate classes by codon position than to randomly permuted partitions but still allow for additional variability across sites.</description><subject>Across-site rate variation</subject><subject>Bayesian estimation</subject><subject>Classification - methods</subject><subject>Codons</subject><subject>Deoxyribonucleic acid</subject><subject>Dirichlet process prior</subject><subject>DNA</subject><subject>Genetic variation</subject><subject>Genetics</subject><subject>Markov analysis</subject><subject>Markov chain Monte Carlo</subject><subject>Markov chains</subject><subject>Modeling</subject><subject>Models, Genetic</subject><subject>Molecular structure</subject><subject>Nucleotide sequences</subject><subject>Nucleotides</subject><subject>Parametric models</subject><subject>Phylogenetics</subject><subject>Phylogeny</subject><subject>Probability distributions</subject><subject>Random variables</subject><subject>Statistics, Nonparametric</subject><subject>Taxonomy</subject><issn>1063-5157</issn><issn>1076-836X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2007</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNqNkMtOwzAQRS0E4lH4ABagiAUrAh47tpNlVfFUoRIvoW4s13EgpYmLnUjw97ikAgk2bGZGmnOvxxehXcDHgFN8AphTBgwLDFxgxrMVtAlY8Dil_Gl1MXMaB0BsoC3vpxgDcAbraANSLFhGk0006kc3tp4rpyrTuFJH16Z5sXlUWBf1tbZVZXPVlPVzpOo8ujf-a-5rZ72P78rGRLcqlEflyoDZehutFWrmzc6y99DD2en94CIejs4vB_1hrBklTZwCYRgSYpSYaFakguoJzrO8IIJNkkSkVGtFmGI81xwoAUVJQnPOqM4gLTjtocPOd-7sWxvOklXptZnNVG1s6yXPQh4cRAAPfoFT27o63CYhCw8BAAkQdNDXv5wp5NyVlXIfErBcRC3_RB00-0vjdlKZ_EexzDYARx1g2_m__PY6fOob674FJMREF3QPxd2-9I15_94r9yq5oILJi6exJNf45mp8S-SYfgJlgJwM</recordid><startdate>200712</startdate><enddate>200712</enddate><creator>Huelsenbeck, John P.</creator><creator>Suchard, Marc A.</creator><creator>Buckley, Thomas</creator><general>Society of Systematic Zoology</general><general>Taylor &amp; Francis</general><general>Oxford University Press</general><scope>BSCLL</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>K9.</scope><scope>7X8</scope></search><sort><creationdate>200712</creationdate><title>A Nonparametric Method for Accommodating and Testing Across-Site Rate Variation</title><author>Huelsenbeck, John P. ; Suchard, Marc A. ; Buckley, Thomas</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c532t-81250142ea7bc5f873cb0d9df275b44783cca25a56dc61321a3243d653c918f63</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2007</creationdate><topic>Across-site rate variation</topic><topic>Bayesian estimation</topic><topic>Classification - methods</topic><topic>Codons</topic><topic>Deoxyribonucleic acid</topic><topic>Dirichlet process prior</topic><topic>DNA</topic><topic>Genetic variation</topic><topic>Genetics</topic><topic>Markov analysis</topic><topic>Markov chain Monte Carlo</topic><topic>Markov chains</topic><topic>Modeling</topic><topic>Models, Genetic</topic><topic>Molecular structure</topic><topic>Nucleotide sequences</topic><topic>Nucleotides</topic><topic>Parametric models</topic><topic>Phylogenetics</topic><topic>Phylogeny</topic><topic>Probability distributions</topic><topic>Random variables</topic><topic>Statistics, Nonparametric</topic><topic>Taxonomy</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Huelsenbeck, John P.</creatorcontrib><creatorcontrib>Suchard, Marc A.</creatorcontrib><creatorcontrib>Buckley, Thomas</creatorcontrib><collection>Istex</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>MEDLINE - Academic</collection><jtitle>Systematic biology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Huelsenbeck, John P.</au><au>Suchard, Marc A.</au><au>Buckley, Thomas</au><au>Buckley, Thomas</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A Nonparametric Method for Accommodating and Testing Across-Site Rate Variation</atitle><jtitle>Systematic biology</jtitle><addtitle>Syst Biol</addtitle><date>2007-12</date><risdate>2007</risdate><volume>56</volume><issue>6</issue><spage>975</spage><epage>987</epage><pages>975-987</pages><issn>1063-5157</issn><eissn>1076-836X</eissn><abstract>Substitution rates are one of the most fundamental parameters in a phylogenetic analysis and are represented in phylogenetic models as the branch lengths on a tree. Variation in substitution rates across an alignment of molecular sequences is well established and likely caused by variation in functional constraint across the genes encoded in the sequences. Rate variation across alignment sites is important to accommodate in a phylogenetic analysis; failure to account for across-site rate variation can cause biased estimates of phylogeny or other model parameters. Traditionally, rate variation across sites has been modeled by treating the rate for a site as a random variable drawn from some probability distribution (such as the gamma probability distribution) or by partitioning sites to different rate classes and estimating the rate for each class independently. We consider a different approach, related to site-specific models in which sites are partitioned to rate classes. However, instead of treating the partitioning scheme in which sites are assigned to rate classes as a fixed assumption of the analysis, we treat the rate partitioning as a random variable under a Dirichlet process prior. We find that the Dirichlet process prior model for across-site rate variation fits alignments of DNA sequence data better than commonly used models of across-site rate variation. The method appears to identify the underlying codon structure of protein-coding genes; rate partitions that were sampled by the Markov chain Monte Carlo procedure were closer to a partition in which sites are assigned to rate classes by codon position than to randomly permuted partitions but still allow for additional variability across sites.</abstract><cop>England</cop><pub>Society of Systematic Zoology</pub><pmid>18075934</pmid><doi>10.1080/10635150701670569</doi><tpages>13</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1063-5157
ispartof Systematic biology, 2007-12, Vol.56 (6), p.975-987
issn 1063-5157
1076-836X
language eng
recordid cdi_proquest_miscellaneous_69056617
source Jstor Complete Legacy; Oxford University Press Journals All Titles (1996-Current); MEDLINE
subjects Across-site rate variation
Bayesian estimation
Classification - methods
Codons
Deoxyribonucleic acid
Dirichlet process prior
DNA
Genetic variation
Genetics
Markov analysis
Markov chain Monte Carlo
Markov chains
Modeling
Models, Genetic
Molecular structure
Nucleotide sequences
Nucleotides
Parametric models
Phylogenetics
Phylogeny
Probability distributions
Random variables
Statistics, Nonparametric
Taxonomy
title A Nonparametric Method for Accommodating and Testing Across-Site Rate Variation
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-29T23%3A03%3A38IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-jstor_proqu&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20Nonparametric%20Method%20for%20Accommodating%20and%20Testing%20Across-Site%20Rate%20Variation&rft.jtitle=Systematic%20biology&rft.au=Huelsenbeck,%20John%20P.&rft.date=2007-12&rft.volume=56&rft.issue=6&rft.spage=975&rft.epage=987&rft.pages=975-987&rft.issn=1063-5157&rft.eissn=1076-836X&rft_id=info:doi/10.1080/10635150701670569&rft_dat=%3Cjstor_proqu%3E20143106%3C/jstor_proqu%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=194781112&rft_id=info:pmid/18075934&rft_jstor_id=20143106&rft_oup_id=10.1080/10635150701670569&rfr_iscdi=true