A general framework for moment-based analysis of genetic data

In population genetics, the Dirichlet (also called the Balding–Nichols) model has for 20 years been considered the key model to approximate the distribution of allele fractions within populations in a multi-allelic setting. It has often been noted that the Dirichlet assumption is approximate because...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of mathematical biology 2019-05, Vol.78 (6), p.1727-1769
Hauptverfasser: Speed, Maria Simonsen, Balding, David Joseph, Hobolth, Asger
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1769
container_issue 6
container_start_page 1727
container_title Journal of mathematical biology
container_volume 78
creator Speed, Maria Simonsen
Balding, David Joseph
Hobolth, Asger
description In population genetics, the Dirichlet (also called the Balding–Nichols) model has for 20 years been considered the key model to approximate the distribution of allele fractions within populations in a multi-allelic setting. It has often been noted that the Dirichlet assumption is approximate because positive correlations among alleles cannot be accommodated under the Dirichlet model. However, the validity of the Dirichlet distribution has never been systematically investigated in a general framework. This paper attempts to address this problem by providing a general overview of how allele fraction data under the most common multi-allelic mutational structures should be modeled. The Dirichlet and alternative models are investigated by simulating allele fractions from a diffusion approximation of the multi-allelic Wright–Fisher process with mutation, and applying a moment-based analysis method. The study shows that the optimal modeling strategy for the distribution of allele fractions depends on the specific mutation process. The Dirichlet model is only an exceptionally good approximation for the pure drift, Jukes–Cantor and parent-independent mutation processes with small mutation rates. Alternative models are required and proposed for the other mutation processes, such as a Beta–Dirichlet model for the infinite alleles mutation process, and a Hierarchical Beta model for the Kimura, Hasegawa–Kishino–Yano and Tamura–Nei processes. Finally, a novel Hierarchical Beta approximation is developed, a Pyramidal Hierarchical Beta model, for the generalized time-reversible and single-step mutation processes.
doi_str_mv 10.1007/s00285-018-01325-0
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_2210231983</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2177016523</sourcerecordid><originalsourceid>FETCH-LOGICAL-c375t-b6c7316179fef88471fe2dd8a604196986beb2e9af68d43fb297eef538dd45923</originalsourceid><addsrcrecordid>eNp9kMtKAzEUhoMotlZfwIUMuHEzek4yk2QWLkrxBgU3ug6ZmZPSOpeatEjf3rT1Ai5chATy_f85fIydI1wjgLoJAFznKaCOR_D4OmBDzARPMUN5yIYgQKRSIx-wkxAWAKjyAo_ZQIASGSg1ZLfjZEYdedskztuWPnr_lrjeJ23fUrdKSxuoTmxnm02Yh6R3O3w1r5LaruwpO3K2CXT2dY_Y6_3dy-QxnT4_PE3G07QSKo8lslICJarCkdM6U-iI17W2EjIsZKFlSSWnwjqp60y4kheKyOVC13WWF1yM2NW-d-n79zWFlWnnoaKmsR3162A4R-ACCy0ievkHXfRrH_ePFCoFKHO-pfieqnwfgidnln7eWr8xCGYr1-zlmijX7OQaiKGLr-p12VL9E_m2GQGxB0L86mbkf2f_U_sJdyCCrw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2177016523</pqid></control><display><type>article</type><title>A general framework for moment-based analysis of genetic data</title><source>SpringerLink Journals - AutoHoldings</source><creator>Speed, Maria Simonsen ; Balding, David Joseph ; Hobolth, Asger</creator><creatorcontrib>Speed, Maria Simonsen ; Balding, David Joseph ; Hobolth, Asger</creatorcontrib><description>In population genetics, the Dirichlet (also called the Balding–Nichols) model has for 20 years been considered the key model to approximate the distribution of allele fractions within populations in a multi-allelic setting. It has often been noted that the Dirichlet assumption is approximate because positive correlations among alleles cannot be accommodated under the Dirichlet model. However, the validity of the Dirichlet distribution has never been systematically investigated in a general framework. This paper attempts to address this problem by providing a general overview of how allele fraction data under the most common multi-allelic mutational structures should be modeled. The Dirichlet and alternative models are investigated by simulating allele fractions from a diffusion approximation of the multi-allelic Wright–Fisher process with mutation, and applying a moment-based analysis method. The study shows that the optimal modeling strategy for the distribution of allele fractions depends on the specific mutation process. The Dirichlet model is only an exceptionally good approximation for the pure drift, Jukes–Cantor and parent-independent mutation processes with small mutation rates. Alternative models are required and proposed for the other mutation processes, such as a Beta–Dirichlet model for the infinite alleles mutation process, and a Hierarchical Beta model for the Kimura, Hasegawa–Kishino–Yano and Tamura–Nei processes. Finally, a novel Hierarchical Beta approximation is developed, a Pyramidal Hierarchical Beta model, for the generalized time-reversible and single-step mutation processes.</description><identifier>ISSN: 0303-6812</identifier><identifier>EISSN: 1432-1416</identifier><identifier>DOI: 10.1007/s00285-018-01325-0</identifier><identifier>PMID: 30734077</identifier><language>eng</language><publisher>Berlin/Heidelberg: Springer Berlin Heidelberg</publisher><subject>Alleles ; Applications of Mathematics ; Approximation ; Computer simulation ; Data processing ; Dirichlet problem ; Genetic analysis ; Genetics ; Mathematical analysis ; Mathematical and Computational Biology ; Mathematics ; Mathematics and Statistics ; Mutation ; Mutation rates ; Population genetics</subject><ispartof>Journal of mathematical biology, 2019-05, Vol.78 (6), p.1727-1769</ispartof><rights>Springer-Verlag GmbH Germany, part of Springer Nature 2019</rights><rights>Journal of Mathematical Biology is a copyright of Springer, (2019). All Rights Reserved.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c375t-b6c7316179fef88471fe2dd8a604196986beb2e9af68d43fb297eef538dd45923</citedby><cites>FETCH-LOGICAL-c375t-b6c7316179fef88471fe2dd8a604196986beb2e9af68d43fb297eef538dd45923</cites><orcidid>0000-0002-3356-2080 ; 0000-0003-4056-1286 ; 0000-0002-1480-6115</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s00285-018-01325-0$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s00285-018-01325-0$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,776,780,27901,27902,41464,42533,51294</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/30734077$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Speed, Maria Simonsen</creatorcontrib><creatorcontrib>Balding, David Joseph</creatorcontrib><creatorcontrib>Hobolth, Asger</creatorcontrib><title>A general framework for moment-based analysis of genetic data</title><title>Journal of mathematical biology</title><addtitle>J. Math. Biol</addtitle><addtitle>J Math Biol</addtitle><description>In population genetics, the Dirichlet (also called the Balding–Nichols) model has for 20 years been considered the key model to approximate the distribution of allele fractions within populations in a multi-allelic setting. It has often been noted that the Dirichlet assumption is approximate because positive correlations among alleles cannot be accommodated under the Dirichlet model. However, the validity of the Dirichlet distribution has never been systematically investigated in a general framework. This paper attempts to address this problem by providing a general overview of how allele fraction data under the most common multi-allelic mutational structures should be modeled. The Dirichlet and alternative models are investigated by simulating allele fractions from a diffusion approximation of the multi-allelic Wright–Fisher process with mutation, and applying a moment-based analysis method. The study shows that the optimal modeling strategy for the distribution of allele fractions depends on the specific mutation process. The Dirichlet model is only an exceptionally good approximation for the pure drift, Jukes–Cantor and parent-independent mutation processes with small mutation rates. Alternative models are required and proposed for the other mutation processes, such as a Beta–Dirichlet model for the infinite alleles mutation process, and a Hierarchical Beta model for the Kimura, Hasegawa–Kishino–Yano and Tamura–Nei processes. Finally, a novel Hierarchical Beta approximation is developed, a Pyramidal Hierarchical Beta model, for the generalized time-reversible and single-step mutation processes.</description><subject>Alleles</subject><subject>Applications of Mathematics</subject><subject>Approximation</subject><subject>Computer simulation</subject><subject>Data processing</subject><subject>Dirichlet problem</subject><subject>Genetic analysis</subject><subject>Genetics</subject><subject>Mathematical analysis</subject><subject>Mathematical and Computational Biology</subject><subject>Mathematics</subject><subject>Mathematics and Statistics</subject><subject>Mutation</subject><subject>Mutation rates</subject><subject>Population genetics</subject><issn>0303-6812</issn><issn>1432-1416</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNp9kMtKAzEUhoMotlZfwIUMuHEzek4yk2QWLkrxBgU3ug6ZmZPSOpeatEjf3rT1Ai5chATy_f85fIydI1wjgLoJAFznKaCOR_D4OmBDzARPMUN5yIYgQKRSIx-wkxAWAKjyAo_ZQIASGSg1ZLfjZEYdedskztuWPnr_lrjeJ23fUrdKSxuoTmxnm02Yh6R3O3w1r5LaruwpO3K2CXT2dY_Y6_3dy-QxnT4_PE3G07QSKo8lslICJarCkdM6U-iI17W2EjIsZKFlSSWnwjqp60y4kheKyOVC13WWF1yM2NW-d-n79zWFlWnnoaKmsR3162A4R-ACCy0ievkHXfRrH_ePFCoFKHO-pfieqnwfgidnln7eWr8xCGYr1-zlmijX7OQaiKGLr-p12VL9E_m2GQGxB0L86mbkf2f_U_sJdyCCrw</recordid><startdate>20190501</startdate><enddate>20190501</enddate><creator>Speed, Maria Simonsen</creator><creator>Balding, David Joseph</creator><creator>Hobolth, Asger</creator><general>Springer Berlin Heidelberg</general><general>Springer Nature B.V</general><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7TK</scope><scope>7TM</scope><scope>7U9</scope><scope>7X7</scope><scope>7XB</scope><scope>88A</scope><scope>88E</scope><scope>8AO</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>BHPHI</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FR3</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>H94</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>K9.</scope><scope>L6V</scope><scope>LK8</scope><scope>M0S</scope><scope>M1P</scope><scope>M7N</scope><scope>M7P</scope><scope>M7S</scope><scope>M7Z</scope><scope>P5Z</scope><scope>P62</scope><scope>P64</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0002-3356-2080</orcidid><orcidid>https://orcid.org/0000-0003-4056-1286</orcidid><orcidid>https://orcid.org/0000-0002-1480-6115</orcidid></search><sort><creationdate>20190501</creationdate><title>A general framework for moment-based analysis of genetic data</title><author>Speed, Maria Simonsen ; Balding, David Joseph ; Hobolth, Asger</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c375t-b6c7316179fef88471fe2dd8a604196986beb2e9af68d43fb297eef538dd45923</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Alleles</topic><topic>Applications of Mathematics</topic><topic>Approximation</topic><topic>Computer simulation</topic><topic>Data processing</topic><topic>Dirichlet problem</topic><topic>Genetic analysis</topic><topic>Genetics</topic><topic>Mathematical analysis</topic><topic>Mathematical and Computational Biology</topic><topic>Mathematics</topic><topic>Mathematics and Statistics</topic><topic>Mutation</topic><topic>Mutation rates</topic><topic>Population genetics</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Speed, Maria Simonsen</creatorcontrib><creatorcontrib>Balding, David Joseph</creatorcontrib><creatorcontrib>Hobolth, Asger</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Neurosciences Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Virology and AIDS Abstracts</collection><collection>Health &amp; Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Biology Database (Alumni Edition)</collection><collection>Medical Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>Natural Science Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Engineering Research Database</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>AIDS and Cancer Research Abstracts</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>ProQuest Engineering Collection</collection><collection>ProQuest Biological Science Collection</collection><collection>Health &amp; Medical Collection (Alumni Edition)</collection><collection>Medical Database</collection><collection>Algology Mycology and Protozoology Abstracts (Microbiology C)</collection><collection>Biological Science Database</collection><collection>Engineering Database</collection><collection>Biochemistry Abstracts 1</collection><collection>Advanced Technologies &amp; Aerospace Database</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection><collection>MEDLINE - Academic</collection><jtitle>Journal of mathematical biology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Speed, Maria Simonsen</au><au>Balding, David Joseph</au><au>Hobolth, Asger</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A general framework for moment-based analysis of genetic data</atitle><jtitle>Journal of mathematical biology</jtitle><stitle>J. Math. Biol</stitle><addtitle>J Math Biol</addtitle><date>2019-05-01</date><risdate>2019</risdate><volume>78</volume><issue>6</issue><spage>1727</spage><epage>1769</epage><pages>1727-1769</pages><issn>0303-6812</issn><eissn>1432-1416</eissn><abstract>In population genetics, the Dirichlet (also called the Balding–Nichols) model has for 20 years been considered the key model to approximate the distribution of allele fractions within populations in a multi-allelic setting. It has often been noted that the Dirichlet assumption is approximate because positive correlations among alleles cannot be accommodated under the Dirichlet model. However, the validity of the Dirichlet distribution has never been systematically investigated in a general framework. This paper attempts to address this problem by providing a general overview of how allele fraction data under the most common multi-allelic mutational structures should be modeled. The Dirichlet and alternative models are investigated by simulating allele fractions from a diffusion approximation of the multi-allelic Wright–Fisher process with mutation, and applying a moment-based analysis method. The study shows that the optimal modeling strategy for the distribution of allele fractions depends on the specific mutation process. The Dirichlet model is only an exceptionally good approximation for the pure drift, Jukes–Cantor and parent-independent mutation processes with small mutation rates. Alternative models are required and proposed for the other mutation processes, such as a Beta–Dirichlet model for the infinite alleles mutation process, and a Hierarchical Beta model for the Kimura, Hasegawa–Kishino–Yano and Tamura–Nei processes. Finally, a novel Hierarchical Beta approximation is developed, a Pyramidal Hierarchical Beta model, for the generalized time-reversible and single-step mutation processes.</abstract><cop>Berlin/Heidelberg</cop><pub>Springer Berlin Heidelberg</pub><pmid>30734077</pmid><doi>10.1007/s00285-018-01325-0</doi><tpages>43</tpages><orcidid>https://orcid.org/0000-0002-3356-2080</orcidid><orcidid>https://orcid.org/0000-0003-4056-1286</orcidid><orcidid>https://orcid.org/0000-0002-1480-6115</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 0303-6812
ispartof Journal of mathematical biology, 2019-05, Vol.78 (6), p.1727-1769
issn 0303-6812
1432-1416
language eng
recordid cdi_proquest_miscellaneous_2210231983
source SpringerLink Journals - AutoHoldings
subjects Alleles
Applications of Mathematics
Approximation
Computer simulation
Data processing
Dirichlet problem
Genetic analysis
Genetics
Mathematical analysis
Mathematical and Computational Biology
Mathematics
Mathematics and Statistics
Mutation
Mutation rates
Population genetics
title A general framework for moment-based analysis of genetic data
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-04T21%3A44%3A23IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20general%20framework%20for%20moment-based%20analysis%20of%20genetic%20data&rft.jtitle=Journal%20of%20mathematical%20biology&rft.au=Speed,%20Maria%20Simonsen&rft.date=2019-05-01&rft.volume=78&rft.issue=6&rft.spage=1727&rft.epage=1769&rft.pages=1727-1769&rft.issn=0303-6812&rft.eissn=1432-1416&rft_id=info:doi/10.1007/s00285-018-01325-0&rft_dat=%3Cproquest_cross%3E2177016523%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2177016523&rft_id=info:pmid/30734077&rfr_iscdi=true