A Nonparametric Method for Accommodating and Testing Across-Site Rate Variation
Substitution rates are one of the most fundamental parameters in a phylogenetic analysis and are represented in phylogenetic models as the branch lengths on a tree. Variation in substitution rates across an alignment of molecular sequences is well established and likely caused by variation in functi...
Gespeichert in:
Veröffentlicht in: | Systematic biology 2007-12, Vol.56 (6), p.975-987 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 987 |
---|---|
container_issue | 6 |
container_start_page | 975 |
container_title | Systematic biology |
container_volume | 56 |
creator | Huelsenbeck, John P. Suchard, Marc A. Buckley, Thomas |
description | Substitution rates are one of the most fundamental parameters in a phylogenetic analysis and are represented in phylogenetic models as the branch lengths on a tree. Variation in substitution rates across an alignment of molecular sequences is well established and likely caused by variation in functional constraint across the genes encoded in the sequences. Rate variation across alignment sites is important to accommodate in a phylogenetic analysis; failure to account for across-site rate variation can cause biased estimates of phylogeny or other model parameters. Traditionally, rate variation across sites has been modeled by treating the rate for a site as a random variable drawn from some probability distribution (such as the gamma probability distribution) or by partitioning sites to different rate classes and estimating the rate for each class independently. We consider a different approach, related to site-specific models in which sites are partitioned to rate classes. However, instead of treating the partitioning scheme in which sites are assigned to rate classes as a fixed assumption of the analysis, we treat the rate partitioning as a random variable under a Dirichlet process prior. We find that the Dirichlet process prior model for across-site rate variation fits alignments of DNA sequence data better than commonly used models of across-site rate variation. The method appears to identify the underlying codon structure of protein-coding genes; rate partitions that were sampled by the Markov chain Monte Carlo procedure were closer to a partition in which sites are assigned to rate classes by codon position than to randomly permuted partitions but still allow for additional variability across sites. |
doi_str_mv | 10.1080/10635150701670569 |
format | Article |
fullrecord | <record><control><sourceid>jstor_proqu</sourceid><recordid>TN_cdi_proquest_miscellaneous_69056617</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><jstor_id>20143106</jstor_id><oup_id>10.1080/10635150701670569</oup_id><sourcerecordid>20143106</sourcerecordid><originalsourceid>FETCH-LOGICAL-c532t-81250142ea7bc5f873cb0d9df275b44783cca25a56dc61321a3243d653c918f63</originalsourceid><addsrcrecordid>eNqNkMtOwzAQRS0E4lH4ABagiAUrAh47tpNlVfFUoRIvoW4s13EgpYmLnUjw97ikAgk2bGZGmnOvxxehXcDHgFN8AphTBgwLDFxgxrMVtAlY8Dil_Gl1MXMaB0BsoC3vpxgDcAbraANSLFhGk0006kc3tp4rpyrTuFJH16Z5sXlUWBf1tbZVZXPVlPVzpOo8ujf-a-5rZ72P78rGRLcqlEflyoDZehutFWrmzc6y99DD2en94CIejs4vB_1hrBklTZwCYRgSYpSYaFakguoJzrO8IIJNkkSkVGtFmGI81xwoAUVJQnPOqM4gLTjtocPOd-7sWxvOklXptZnNVG1s6yXPQh4cRAAPfoFT27o63CYhCw8BAAkQdNDXv5wp5NyVlXIfErBcRC3_RB00-0vjdlKZ_EexzDYARx1g2_m__PY6fOob674FJMREF3QPxd2-9I15_94r9yq5oILJi6exJNf45mp8S-SYfgJlgJwM</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>194781112</pqid></control><display><type>article</type><title>A Nonparametric Method for Accommodating and Testing Across-Site Rate Variation</title><source>Jstor Complete Legacy</source><source>Oxford University Press Journals All Titles (1996-Current)</source><source>MEDLINE</source><creator>Huelsenbeck, John P. ; Suchard, Marc A. ; Buckley, Thomas</creator><contributor>Buckley, Thomas</contributor><creatorcontrib>Huelsenbeck, John P. ; Suchard, Marc A. ; Buckley, Thomas ; Buckley, Thomas</creatorcontrib><description>Substitution rates are one of the most fundamental parameters in a phylogenetic analysis and are represented in phylogenetic models as the branch lengths on a tree. Variation in substitution rates across an alignment of molecular sequences is well established and likely caused by variation in functional constraint across the genes encoded in the sequences. Rate variation across alignment sites is important to accommodate in a phylogenetic analysis; failure to account for across-site rate variation can cause biased estimates of phylogeny or other model parameters. Traditionally, rate variation across sites has been modeled by treating the rate for a site as a random variable drawn from some probability distribution (such as the gamma probability distribution) or by partitioning sites to different rate classes and estimating the rate for each class independently. We consider a different approach, related to site-specific models in which sites are partitioned to rate classes. However, instead of treating the partitioning scheme in which sites are assigned to rate classes as a fixed assumption of the analysis, we treat the rate partitioning as a random variable under a Dirichlet process prior. We find that the Dirichlet process prior model for across-site rate variation fits alignments of DNA sequence data better than commonly used models of across-site rate variation. The method appears to identify the underlying codon structure of protein-coding genes; rate partitions that were sampled by the Markov chain Monte Carlo procedure were closer to a partition in which sites are assigned to rate classes by codon position than to randomly permuted partitions but still allow for additional variability across sites.</description><identifier>ISSN: 1063-5157</identifier><identifier>EISSN: 1076-836X</identifier><identifier>DOI: 10.1080/10635150701670569</identifier><identifier>PMID: 18075934</identifier><language>eng</language><publisher>England: Society of Systematic Zoology</publisher><subject>Across-site rate variation ; Bayesian estimation ; Classification - methods ; Codons ; Deoxyribonucleic acid ; Dirichlet process prior ; DNA ; Genetic variation ; Genetics ; Markov analysis ; Markov chain Monte Carlo ; Markov chains ; Modeling ; Models, Genetic ; Molecular structure ; Nucleotide sequences ; Nucleotides ; Parametric models ; Phylogenetics ; Phylogeny ; Probability distributions ; Random variables ; Statistics, Nonparametric ; Taxonomy</subject><ispartof>Systematic biology, 2007-12, Vol.56 (6), p.975-987</ispartof><rights>Copyright 2007 Society of Systematic Biologists</rights><rights>2007 Society of Systematic Biologists 2007</rights><rights>Copyright Taylor & Francis Ltd. Dec 2007</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c532t-81250142ea7bc5f873cb0d9df275b44783cca25a56dc61321a3243d653c918f63</citedby><cites>FETCH-LOGICAL-c532t-81250142ea7bc5f873cb0d9df275b44783cca25a56dc61321a3243d653c918f63</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.jstor.org/stable/pdf/20143106$$EPDF$$P50$$Gjstor$$H</linktopdf><linktohtml>$$Uhttps://www.jstor.org/stable/20143106$$EHTML$$P50$$Gjstor$$H</linktohtml><link.rule.ids>314,776,780,799,27901,27902,57992,58225</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/18075934$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><contributor>Buckley, Thomas</contributor><creatorcontrib>Huelsenbeck, John P.</creatorcontrib><creatorcontrib>Suchard, Marc A.</creatorcontrib><creatorcontrib>Buckley, Thomas</creatorcontrib><title>A Nonparametric Method for Accommodating and Testing Across-Site Rate Variation</title><title>Systematic biology</title><addtitle>Syst Biol</addtitle><description>Substitution rates are one of the most fundamental parameters in a phylogenetic analysis and are represented in phylogenetic models as the branch lengths on a tree. Variation in substitution rates across an alignment of molecular sequences is well established and likely caused by variation in functional constraint across the genes encoded in the sequences. Rate variation across alignment sites is important to accommodate in a phylogenetic analysis; failure to account for across-site rate variation can cause biased estimates of phylogeny or other model parameters. Traditionally, rate variation across sites has been modeled by treating the rate for a site as a random variable drawn from some probability distribution (such as the gamma probability distribution) or by partitioning sites to different rate classes and estimating the rate for each class independently. We consider a different approach, related to site-specific models in which sites are partitioned to rate classes. However, instead of treating the partitioning scheme in which sites are assigned to rate classes as a fixed assumption of the analysis, we treat the rate partitioning as a random variable under a Dirichlet process prior. We find that the Dirichlet process prior model for across-site rate variation fits alignments of DNA sequence data better than commonly used models of across-site rate variation. The method appears to identify the underlying codon structure of protein-coding genes; rate partitions that were sampled by the Markov chain Monte Carlo procedure were closer to a partition in which sites are assigned to rate classes by codon position than to randomly permuted partitions but still allow for additional variability across sites.</description><subject>Across-site rate variation</subject><subject>Bayesian estimation</subject><subject>Classification - methods</subject><subject>Codons</subject><subject>Deoxyribonucleic acid</subject><subject>Dirichlet process prior</subject><subject>DNA</subject><subject>Genetic variation</subject><subject>Genetics</subject><subject>Markov analysis</subject><subject>Markov chain Monte Carlo</subject><subject>Markov chains</subject><subject>Modeling</subject><subject>Models, Genetic</subject><subject>Molecular structure</subject><subject>Nucleotide sequences</subject><subject>Nucleotides</subject><subject>Parametric models</subject><subject>Phylogenetics</subject><subject>Phylogeny</subject><subject>Probability distributions</subject><subject>Random variables</subject><subject>Statistics, Nonparametric</subject><subject>Taxonomy</subject><issn>1063-5157</issn><issn>1076-836X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2007</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNqNkMtOwzAQRS0E4lH4ABagiAUrAh47tpNlVfFUoRIvoW4s13EgpYmLnUjw97ikAgk2bGZGmnOvxxehXcDHgFN8AphTBgwLDFxgxrMVtAlY8Dil_Gl1MXMaB0BsoC3vpxgDcAbraANSLFhGk0006kc3tp4rpyrTuFJH16Z5sXlUWBf1tbZVZXPVlPVzpOo8ujf-a-5rZ72P78rGRLcqlEflyoDZehutFWrmzc6y99DD2en94CIejs4vB_1hrBklTZwCYRgSYpSYaFakguoJzrO8IIJNkkSkVGtFmGI81xwoAUVJQnPOqM4gLTjtocPOd-7sWxvOklXptZnNVG1s6yXPQh4cRAAPfoFT27o63CYhCw8BAAkQdNDXv5wp5NyVlXIfErBcRC3_RB00-0vjdlKZ_EexzDYARx1g2_m__PY6fOob674FJMREF3QPxd2-9I15_94r9yq5oILJi6exJNf45mp8S-SYfgJlgJwM</recordid><startdate>200712</startdate><enddate>200712</enddate><creator>Huelsenbeck, John P.</creator><creator>Suchard, Marc A.</creator><creator>Buckley, Thomas</creator><general>Society of Systematic Zoology</general><general>Taylor & Francis</general><general>Oxford University Press</general><scope>BSCLL</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>K9.</scope><scope>7X8</scope></search><sort><creationdate>200712</creationdate><title>A Nonparametric Method for Accommodating and Testing Across-Site Rate Variation</title><author>Huelsenbeck, John P. ; Suchard, Marc A. ; Buckley, Thomas</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c532t-81250142ea7bc5f873cb0d9df275b44783cca25a56dc61321a3243d653c918f63</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2007</creationdate><topic>Across-site rate variation</topic><topic>Bayesian estimation</topic><topic>Classification - methods</topic><topic>Codons</topic><topic>Deoxyribonucleic acid</topic><topic>Dirichlet process prior</topic><topic>DNA</topic><topic>Genetic variation</topic><topic>Genetics</topic><topic>Markov analysis</topic><topic>Markov chain Monte Carlo</topic><topic>Markov chains</topic><topic>Modeling</topic><topic>Models, Genetic</topic><topic>Molecular structure</topic><topic>Nucleotide sequences</topic><topic>Nucleotides</topic><topic>Parametric models</topic><topic>Phylogenetics</topic><topic>Phylogeny</topic><topic>Probability distributions</topic><topic>Random variables</topic><topic>Statistics, Nonparametric</topic><topic>Taxonomy</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Huelsenbeck, John P.</creatorcontrib><creatorcontrib>Suchard, Marc A.</creatorcontrib><creatorcontrib>Buckley, Thomas</creatorcontrib><collection>Istex</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>ProQuest Health & Medical Complete (Alumni)</collection><collection>MEDLINE - Academic</collection><jtitle>Systematic biology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Huelsenbeck, John P.</au><au>Suchard, Marc A.</au><au>Buckley, Thomas</au><au>Buckley, Thomas</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A Nonparametric Method for Accommodating and Testing Across-Site Rate Variation</atitle><jtitle>Systematic biology</jtitle><addtitle>Syst Biol</addtitle><date>2007-12</date><risdate>2007</risdate><volume>56</volume><issue>6</issue><spage>975</spage><epage>987</epage><pages>975-987</pages><issn>1063-5157</issn><eissn>1076-836X</eissn><abstract>Substitution rates are one of the most fundamental parameters in a phylogenetic analysis and are represented in phylogenetic models as the branch lengths on a tree. Variation in substitution rates across an alignment of molecular sequences is well established and likely caused by variation in functional constraint across the genes encoded in the sequences. Rate variation across alignment sites is important to accommodate in a phylogenetic analysis; failure to account for across-site rate variation can cause biased estimates of phylogeny or other model parameters. Traditionally, rate variation across sites has been modeled by treating the rate for a site as a random variable drawn from some probability distribution (such as the gamma probability distribution) or by partitioning sites to different rate classes and estimating the rate for each class independently. We consider a different approach, related to site-specific models in which sites are partitioned to rate classes. However, instead of treating the partitioning scheme in which sites are assigned to rate classes as a fixed assumption of the analysis, we treat the rate partitioning as a random variable under a Dirichlet process prior. We find that the Dirichlet process prior model for across-site rate variation fits alignments of DNA sequence data better than commonly used models of across-site rate variation. The method appears to identify the underlying codon structure of protein-coding genes; rate partitions that were sampled by the Markov chain Monte Carlo procedure were closer to a partition in which sites are assigned to rate classes by codon position than to randomly permuted partitions but still allow for additional variability across sites.</abstract><cop>England</cop><pub>Society of Systematic Zoology</pub><pmid>18075934</pmid><doi>10.1080/10635150701670569</doi><tpages>13</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1063-5157 |
ispartof | Systematic biology, 2007-12, Vol.56 (6), p.975-987 |
issn | 1063-5157 1076-836X |
language | eng |
recordid | cdi_proquest_miscellaneous_69056617 |
source | Jstor Complete Legacy; Oxford University Press Journals All Titles (1996-Current); MEDLINE |
subjects | Across-site rate variation Bayesian estimation Classification - methods Codons Deoxyribonucleic acid Dirichlet process prior DNA Genetic variation Genetics Markov analysis Markov chain Monte Carlo Markov chains Modeling Models, Genetic Molecular structure Nucleotide sequences Nucleotides Parametric models Phylogenetics Phylogeny Probability distributions Random variables Statistics, Nonparametric Taxonomy |
title | A Nonparametric Method for Accommodating and Testing Across-Site Rate Variation |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-29T23%3A03%3A38IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-jstor_proqu&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20Nonparametric%20Method%20for%20Accommodating%20and%20Testing%20Across-Site%20Rate%20Variation&rft.jtitle=Systematic%20biology&rft.au=Huelsenbeck,%20John%20P.&rft.date=2007-12&rft.volume=56&rft.issue=6&rft.spage=975&rft.epage=987&rft.pages=975-987&rft.issn=1063-5157&rft.eissn=1076-836X&rft_id=info:doi/10.1080/10635150701670569&rft_dat=%3Cjstor_proqu%3E20143106%3C/jstor_proqu%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=194781112&rft_id=info:pmid/18075934&rft_jstor_id=20143106&rft_oup_id=10.1080/10635150701670569&rfr_iscdi=true |