A general framework for moment-based analysis of genetic data
In population genetics, the Dirichlet (also called the Balding–Nichols) model has for 20 years been considered the key model to approximate the distribution of allele fractions within populations in a multi-allelic setting. It has often been noted that the Dirichlet assumption is approximate because...
Gespeichert in:
Veröffentlicht in: | Journal of mathematical biology 2019-05, Vol.78 (6), p.1727-1769 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 1769 |
---|---|
container_issue | 6 |
container_start_page | 1727 |
container_title | Journal of mathematical biology |
container_volume | 78 |
creator | Speed, Maria Simonsen Balding, David Joseph Hobolth, Asger |
description | In population genetics, the Dirichlet (also called the Balding–Nichols) model has for 20 years been considered the key model to approximate the distribution of allele fractions within populations in a multi-allelic setting. It has often been noted that the Dirichlet assumption is approximate because positive correlations among alleles cannot be accommodated under the Dirichlet model. However, the validity of the Dirichlet distribution has never been systematically investigated in a general framework. This paper attempts to address this problem by providing a general overview of how allele fraction data under the most common multi-allelic mutational structures should be modeled. The Dirichlet and alternative models are investigated by simulating allele fractions from a diffusion approximation of the multi-allelic Wright–Fisher process with mutation, and applying a moment-based analysis method. The study shows that the optimal modeling strategy for the distribution of allele fractions depends on the specific mutation process. The Dirichlet model is only an exceptionally good approximation for the pure drift, Jukes–Cantor and parent-independent mutation processes with small mutation rates. Alternative models are required and proposed for the other mutation processes, such as a Beta–Dirichlet model for the infinite alleles mutation process, and a Hierarchical Beta model for the Kimura, Hasegawa–Kishino–Yano and Tamura–Nei processes. Finally, a novel Hierarchical Beta approximation is developed, a Pyramidal Hierarchical Beta model, for the generalized time-reversible and single-step mutation processes. |
doi_str_mv | 10.1007/s00285-018-01325-0 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_2210231983</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2177016523</sourcerecordid><originalsourceid>FETCH-LOGICAL-c375t-b6c7316179fef88471fe2dd8a604196986beb2e9af68d43fb297eef538dd45923</originalsourceid><addsrcrecordid>eNp9kMtKAzEUhoMotlZfwIUMuHEzek4yk2QWLkrxBgU3ug6ZmZPSOpeatEjf3rT1Ai5chATy_f85fIydI1wjgLoJAFznKaCOR_D4OmBDzARPMUN5yIYgQKRSIx-wkxAWAKjyAo_ZQIASGSg1ZLfjZEYdedskztuWPnr_lrjeJ23fUrdKSxuoTmxnm02Yh6R3O3w1r5LaruwpO3K2CXT2dY_Y6_3dy-QxnT4_PE3G07QSKo8lslICJarCkdM6U-iI17W2EjIsZKFlSSWnwjqp60y4kheKyOVC13WWF1yM2NW-d-n79zWFlWnnoaKmsR3162A4R-ACCy0ievkHXfRrH_ePFCoFKHO-pfieqnwfgidnln7eWr8xCGYr1-zlmijX7OQaiKGLr-p12VL9E_m2GQGxB0L86mbkf2f_U_sJdyCCrw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2177016523</pqid></control><display><type>article</type><title>A general framework for moment-based analysis of genetic data</title><source>SpringerLink Journals - AutoHoldings</source><creator>Speed, Maria Simonsen ; Balding, David Joseph ; Hobolth, Asger</creator><creatorcontrib>Speed, Maria Simonsen ; Balding, David Joseph ; Hobolth, Asger</creatorcontrib><description>In population genetics, the Dirichlet (also called the Balding–Nichols) model has for 20 years been considered the key model to approximate the distribution of allele fractions within populations in a multi-allelic setting. It has often been noted that the Dirichlet assumption is approximate because positive correlations among alleles cannot be accommodated under the Dirichlet model. However, the validity of the Dirichlet distribution has never been systematically investigated in a general framework. This paper attempts to address this problem by providing a general overview of how allele fraction data under the most common multi-allelic mutational structures should be modeled. The Dirichlet and alternative models are investigated by simulating allele fractions from a diffusion approximation of the multi-allelic Wright–Fisher process with mutation, and applying a moment-based analysis method. The study shows that the optimal modeling strategy for the distribution of allele fractions depends on the specific mutation process. The Dirichlet model is only an exceptionally good approximation for the pure drift, Jukes–Cantor and parent-independent mutation processes with small mutation rates. Alternative models are required and proposed for the other mutation processes, such as a Beta–Dirichlet model for the infinite alleles mutation process, and a Hierarchical Beta model for the Kimura, Hasegawa–Kishino–Yano and Tamura–Nei processes. Finally, a novel Hierarchical Beta approximation is developed, a Pyramidal Hierarchical Beta model, for the generalized time-reversible and single-step mutation processes.</description><identifier>ISSN: 0303-6812</identifier><identifier>EISSN: 1432-1416</identifier><identifier>DOI: 10.1007/s00285-018-01325-0</identifier><identifier>PMID: 30734077</identifier><language>eng</language><publisher>Berlin/Heidelberg: Springer Berlin Heidelberg</publisher><subject>Alleles ; Applications of Mathematics ; Approximation ; Computer simulation ; Data processing ; Dirichlet problem ; Genetic analysis ; Genetics ; Mathematical analysis ; Mathematical and Computational Biology ; Mathematics ; Mathematics and Statistics ; Mutation ; Mutation rates ; Population genetics</subject><ispartof>Journal of mathematical biology, 2019-05, Vol.78 (6), p.1727-1769</ispartof><rights>Springer-Verlag GmbH Germany, part of Springer Nature 2019</rights><rights>Journal of Mathematical Biology is a copyright of Springer, (2019). All Rights Reserved.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c375t-b6c7316179fef88471fe2dd8a604196986beb2e9af68d43fb297eef538dd45923</citedby><cites>FETCH-LOGICAL-c375t-b6c7316179fef88471fe2dd8a604196986beb2e9af68d43fb297eef538dd45923</cites><orcidid>0000-0002-3356-2080 ; 0000-0003-4056-1286 ; 0000-0002-1480-6115</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s00285-018-01325-0$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s00285-018-01325-0$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,776,780,27901,27902,41464,42533,51294</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/30734077$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Speed, Maria Simonsen</creatorcontrib><creatorcontrib>Balding, David Joseph</creatorcontrib><creatorcontrib>Hobolth, Asger</creatorcontrib><title>A general framework for moment-based analysis of genetic data</title><title>Journal of mathematical biology</title><addtitle>J. Math. Biol</addtitle><addtitle>J Math Biol</addtitle><description>In population genetics, the Dirichlet (also called the Balding–Nichols) model has for 20 years been considered the key model to approximate the distribution of allele fractions within populations in a multi-allelic setting. It has often been noted that the Dirichlet assumption is approximate because positive correlations among alleles cannot be accommodated under the Dirichlet model. However, the validity of the Dirichlet distribution has never been systematically investigated in a general framework. This paper attempts to address this problem by providing a general overview of how allele fraction data under the most common multi-allelic mutational structures should be modeled. The Dirichlet and alternative models are investigated by simulating allele fractions from a diffusion approximation of the multi-allelic Wright–Fisher process with mutation, and applying a moment-based analysis method. The study shows that the optimal modeling strategy for the distribution of allele fractions depends on the specific mutation process. The Dirichlet model is only an exceptionally good approximation for the pure drift, Jukes–Cantor and parent-independent mutation processes with small mutation rates. Alternative models are required and proposed for the other mutation processes, such as a Beta–Dirichlet model for the infinite alleles mutation process, and a Hierarchical Beta model for the Kimura, Hasegawa–Kishino–Yano and Tamura–Nei processes. Finally, a novel Hierarchical Beta approximation is developed, a Pyramidal Hierarchical Beta model, for the generalized time-reversible and single-step mutation processes.</description><subject>Alleles</subject><subject>Applications of Mathematics</subject><subject>Approximation</subject><subject>Computer simulation</subject><subject>Data processing</subject><subject>Dirichlet problem</subject><subject>Genetic analysis</subject><subject>Genetics</subject><subject>Mathematical analysis</subject><subject>Mathematical and Computational Biology</subject><subject>Mathematics</subject><subject>Mathematics and Statistics</subject><subject>Mutation</subject><subject>Mutation rates</subject><subject>Population genetics</subject><issn>0303-6812</issn><issn>1432-1416</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNp9kMtKAzEUhoMotlZfwIUMuHEzek4yk2QWLkrxBgU3ug6ZmZPSOpeatEjf3rT1Ai5chATy_f85fIydI1wjgLoJAFznKaCOR_D4OmBDzARPMUN5yIYgQKRSIx-wkxAWAKjyAo_ZQIASGSg1ZLfjZEYdedskztuWPnr_lrjeJ23fUrdKSxuoTmxnm02Yh6R3O3w1r5LaruwpO3K2CXT2dY_Y6_3dy-QxnT4_PE3G07QSKo8lslICJarCkdM6U-iI17W2EjIsZKFlSSWnwjqp60y4kheKyOVC13WWF1yM2NW-d-n79zWFlWnnoaKmsR3162A4R-ACCy0ievkHXfRrH_ePFCoFKHO-pfieqnwfgidnln7eWr8xCGYr1-zlmijX7OQaiKGLr-p12VL9E_m2GQGxB0L86mbkf2f_U_sJdyCCrw</recordid><startdate>20190501</startdate><enddate>20190501</enddate><creator>Speed, Maria Simonsen</creator><creator>Balding, David Joseph</creator><creator>Hobolth, Asger</creator><general>Springer Berlin Heidelberg</general><general>Springer Nature B.V</general><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7TK</scope><scope>7TM</scope><scope>7U9</scope><scope>7X7</scope><scope>7XB</scope><scope>88A</scope><scope>88E</scope><scope>8AO</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>BHPHI</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FR3</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>H94</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>K9.</scope><scope>L6V</scope><scope>LK8</scope><scope>M0S</scope><scope>M1P</scope><scope>M7N</scope><scope>M7P</scope><scope>M7S</scope><scope>M7Z</scope><scope>P5Z</scope><scope>P62</scope><scope>P64</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0002-3356-2080</orcidid><orcidid>https://orcid.org/0000-0003-4056-1286</orcidid><orcidid>https://orcid.org/0000-0002-1480-6115</orcidid></search><sort><creationdate>20190501</creationdate><title>A general framework for moment-based analysis of genetic data</title><author>Speed, Maria Simonsen ; Balding, David Joseph ; Hobolth, Asger</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c375t-b6c7316179fef88471fe2dd8a604196986beb2e9af68d43fb297eef538dd45923</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Alleles</topic><topic>Applications of Mathematics</topic><topic>Approximation</topic><topic>Computer simulation</topic><topic>Data processing</topic><topic>Dirichlet problem</topic><topic>Genetic analysis</topic><topic>Genetics</topic><topic>Mathematical analysis</topic><topic>Mathematical and Computational Biology</topic><topic>Mathematics</topic><topic>Mathematics and Statistics</topic><topic>Mutation</topic><topic>Mutation rates</topic><topic>Population genetics</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Speed, Maria Simonsen</creatorcontrib><creatorcontrib>Balding, David Joseph</creatorcontrib><creatorcontrib>Hobolth, Asger</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Neurosciences Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Virology and AIDS Abstracts</collection><collection>Health & Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Biology Database (Alumni Edition)</collection><collection>Medical Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>Natural Science Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Engineering Research Database</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>AIDS and Cancer Research Abstracts</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>ProQuest Health & Medical Complete (Alumni)</collection><collection>ProQuest Engineering Collection</collection><collection>ProQuest Biological Science Collection</collection><collection>Health & Medical Collection (Alumni Edition)</collection><collection>Medical Database</collection><collection>Algology Mycology and Protozoology Abstracts (Microbiology C)</collection><collection>Biological Science Database</collection><collection>Engineering Database</collection><collection>Biochemistry Abstracts 1</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection><collection>MEDLINE - Academic</collection><jtitle>Journal of mathematical biology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Speed, Maria Simonsen</au><au>Balding, David Joseph</au><au>Hobolth, Asger</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A general framework for moment-based analysis of genetic data</atitle><jtitle>Journal of mathematical biology</jtitle><stitle>J. Math. Biol</stitle><addtitle>J Math Biol</addtitle><date>2019-05-01</date><risdate>2019</risdate><volume>78</volume><issue>6</issue><spage>1727</spage><epage>1769</epage><pages>1727-1769</pages><issn>0303-6812</issn><eissn>1432-1416</eissn><abstract>In population genetics, the Dirichlet (also called the Balding–Nichols) model has for 20 years been considered the key model to approximate the distribution of allele fractions within populations in a multi-allelic setting. It has often been noted that the Dirichlet assumption is approximate because positive correlations among alleles cannot be accommodated under the Dirichlet model. However, the validity of the Dirichlet distribution has never been systematically investigated in a general framework. This paper attempts to address this problem by providing a general overview of how allele fraction data under the most common multi-allelic mutational structures should be modeled. The Dirichlet and alternative models are investigated by simulating allele fractions from a diffusion approximation of the multi-allelic Wright–Fisher process with mutation, and applying a moment-based analysis method. The study shows that the optimal modeling strategy for the distribution of allele fractions depends on the specific mutation process. The Dirichlet model is only an exceptionally good approximation for the pure drift, Jukes–Cantor and parent-independent mutation processes with small mutation rates. Alternative models are required and proposed for the other mutation processes, such as a Beta–Dirichlet model for the infinite alleles mutation process, and a Hierarchical Beta model for the Kimura, Hasegawa–Kishino–Yano and Tamura–Nei processes. Finally, a novel Hierarchical Beta approximation is developed, a Pyramidal Hierarchical Beta model, for the generalized time-reversible and single-step mutation processes.</abstract><cop>Berlin/Heidelberg</cop><pub>Springer Berlin Heidelberg</pub><pmid>30734077</pmid><doi>10.1007/s00285-018-01325-0</doi><tpages>43</tpages><orcidid>https://orcid.org/0000-0002-3356-2080</orcidid><orcidid>https://orcid.org/0000-0003-4056-1286</orcidid><orcidid>https://orcid.org/0000-0002-1480-6115</orcidid></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0303-6812 |
ispartof | Journal of mathematical biology, 2019-05, Vol.78 (6), p.1727-1769 |
issn | 0303-6812 1432-1416 |
language | eng |
recordid | cdi_proquest_miscellaneous_2210231983 |
source | SpringerLink Journals - AutoHoldings |
subjects | Alleles Applications of Mathematics Approximation Computer simulation Data processing Dirichlet problem Genetic analysis Genetics Mathematical analysis Mathematical and Computational Biology Mathematics Mathematics and Statistics Mutation Mutation rates Population genetics |
title | A general framework for moment-based analysis of genetic data |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-04T21%3A44%3A23IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20general%20framework%20for%20moment-based%20analysis%20of%20genetic%20data&rft.jtitle=Journal%20of%20mathematical%20biology&rft.au=Speed,%20Maria%20Simonsen&rft.date=2019-05-01&rft.volume=78&rft.issue=6&rft.spage=1727&rft.epage=1769&rft.pages=1727-1769&rft.issn=0303-6812&rft.eissn=1432-1416&rft_id=info:doi/10.1007/s00285-018-01325-0&rft_dat=%3Cproquest_cross%3E2177016523%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2177016523&rft_id=info:pmid/30734077&rfr_iscdi=true |