Identifying consistent allele frequency differences in studies of stratified populations

With increasing application of pooled‐sequencing approaches to population genomics robust methods are needed to accurately quantify allele frequency differences between populations. Identifying consistent differences across stratified populations can allow us to detect genomic regions under selectio...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Methods in ecology and evolution 2017-12, Vol.8 (12), p.1899-1909
Hauptverfasser:	Wiberg, R. Axel W., Gaggiotti, Oscar E., Morrissey, Michael B., Ritchie, Michael G., Johnson, Louise
Format:	Artikel
Sprache:	eng
Schlagworte:	allele frequency differences Alleles Biological evolution CMH‐test Computer simulation Data processing Evolutionary Biology experimental evolution Gene frequency Gene loci Identification methods Phenotypic variations pool‐seq Population (statistical) Population genetics Population studies quasibinomial GLM Researchers Scaling selection Software development tools Statistical analysis Statistical tests
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	1909
container_issue	12
container_start_page	1899
container_title	Methods in ecology and evolution
container_volume	8
creator	Wiberg, R. Axel W. Gaggiotti, Oscar E. Morrissey, Michael B. Ritchie, Michael G. Johnson, Louise
description	With increasing application of pooled‐sequencing approaches to population genomics robust methods are needed to accurately quantify allele frequency differences between populations. Identifying consistent differences across stratified populations can allow us to detect genomic regions under selection and that differ between populations with different histories or attributes. Current popular statistical tests are easily implemented in widely available software tools which make them simple for researchers to apply. However, there are potential problems with the way such tests are used, which means that underlying assumptions about the data are frequently violated. These problems are highlighted by simulation of simple but realistic population genetic models of neutral evolution and the performance of different tests are assessed. We present alternative tests (including Generalised Linear Models [GLMs] with quasibinomial error structure) with attractive properties for the analysis of allele frequency differences and re‐analyse a published dataset. The simulations show that common statistical tests for consistent allele frequency differences perform poorly, with high false positive rates. Applying tests that do not confound heterogeneity and main effects significantly improves inference. Variation in sequencing coverage likely produces many false positives and re‐scaling allele frequencies to counts out of a common value or an effective sample size reduces this effect. Many researchers are interested in identifying allele frequencies that vary consistently across replicates to identify loci underlying phenotypic responses to selection or natural variation in phenotypes. Popular methods that have been suggested for this task perform poorly in simulations. Overall, quasibinomial GLMs perform better and also have the attractive feature of allowing correction for multiple testing by standard procedures and are easily extended to other designs.
doi_str_mv	10.1111/2041-210X.12810
format	Article
fullrecord	<record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_5726381</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1979504313</sourcerecordid><originalsourceid>FETCH-LOGICAL-c4680-415121d8ab864dd304fd67964226b5947333e20697059f7308ab6013c5e11d7c3</originalsourceid><addsrcrecordid>eNqFkc1LAzEQxYMoKrVnb7Lgxcu2-dpk9yKIVC1UvCh4C9tNopF0U5Ou0v_eWatFvZhLXia_PGbyEDomeERgjSnmJKcEP44ILQneQYfbyu4PfYCGKb1gWKysMOX76IBWVDApy0P0ONWmXTm7du1T1oQ2ubSCQlZ7b7zJbDSvnWmbdaadtSaCNClzbZZWnXYggwUZa3BwRmfLsOw8HMDnCO3Z2icz_NoH6OFqcn95k8_urqeXF7O84aLEOScFoUSX9bwUXGuGudVCVoJTKuZFxSVjzFAsKomLykqGgRSYsKYwhGjZsAE63_guu_nC6Aaaj7VXy-gWdVyrUDv1-6Z1z-opvKlCwh-UBAzOvgxigFnTSi1caoz3dWtClxSpZFVgzggD9PQP-hK62MJ4PUVpgXHZU-MN1cSQUjR22wzBqg9O9dGoPhr1GRy8OPk5w5b_jgkAsQHenTfr__zU7WTCNs4fFcCi3w</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1972250083</pqid></control><display><type>article</type><title>Identifying consistent allele frequency differences in studies of stratified populations</title><source>Wiley Journals</source><source>Alma/SFX Local Collection</source><creator>Wiberg, R. Axel W. ; Gaggiotti, Oscar E. ; Morrissey, Michael B. ; Ritchie, Michael G. ; Johnson, Louise</creator><contributor>Johnson, Louise</contributor><creatorcontrib>Wiberg, R. Axel W. ; Gaggiotti, Oscar E. ; Morrissey, Michael B. ; Ritchie, Michael G. ; Johnson, Louise ; Johnson, Louise</creatorcontrib><description>With increasing application of pooled‐sequencing approaches to population genomics robust methods are needed to accurately quantify allele frequency differences between populations. Identifying consistent differences across stratified populations can allow us to detect genomic regions under selection and that differ between populations with different histories or attributes. Current popular statistical tests are easily implemented in widely available software tools which make them simple for researchers to apply. However, there are potential problems with the way such tests are used, which means that underlying assumptions about the data are frequently violated. These problems are highlighted by simulation of simple but realistic population genetic models of neutral evolution and the performance of different tests are assessed. We present alternative tests (including Generalised Linear Models [GLMs] with quasibinomial error structure) with attractive properties for the analysis of allele frequency differences and re‐analyse a published dataset. The simulations show that common statistical tests for consistent allele frequency differences perform poorly, with high false positive rates. Applying tests that do not confound heterogeneity and main effects significantly improves inference. Variation in sequencing coverage likely produces many false positives and re‐scaling allele frequencies to counts out of a common value or an effective sample size reduces this effect. Many researchers are interested in identifying allele frequencies that vary consistently across replicates to identify loci underlying phenotypic responses to selection or natural variation in phenotypes. Popular methods that have been suggested for this task perform poorly in simulations. Overall, quasibinomial GLMs perform better and also have the attractive feature of allowing correction for multiple testing by standard procedures and are easily extended to other designs.</description><identifier>ISSN: 2041-210X</identifier><identifier>EISSN: 2041-210X</identifier><identifier>DOI: 10.1111/2041-210X.12810</identifier><identifier>PMID: 29263778</identifier><language>eng</language><publisher>United States: John Wiley & Sons, Inc</publisher><subject>allele frequency differences ; Alleles ; Biological evolution ; CMH‐test ; Computer simulation ; Data processing ; Evolutionary Biology ; experimental evolution ; Gene frequency ; Gene loci ; Identification methods ; Phenotypic variations ; pool‐seq ; Population (statistical) ; Population genetics ; Population studies ; quasibinomial GLM ; Researchers ; Scaling ; selection ; Software development tools ; Statistical analysis ; Statistical tests</subject><ispartof>Methods in ecology and evolution, 2017-12, Vol.8 (12), p.1899-1909</ispartof><rights>2017 The Authors. published by John Wiley & Sons Ltd on behalf of British Ecological Society.</rights><rights>Methods in Ecology and Evolution © 2017 British Ecological Society</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c4680-415121d8ab864dd304fd67964226b5947333e20697059f7308ab6013c5e11d7c3</citedby><cites>FETCH-LOGICAL-c4680-415121d8ab864dd304fd67964226b5947333e20697059f7308ab6013c5e11d7c3</cites><orcidid>0000-0002-8074-8670</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://onlinelibrary.wiley.com/doi/pdf/10.1111%2F2041-210X.12810$$EPDF$$P50$$Gwiley$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://onlinelibrary.wiley.com/doi/full/10.1111%2F2041-210X.12810$$EHTML$$P50$$Gwiley$$Hfree_for_read</linktohtml><link.rule.ids>230,314,780,784,885,1417,27924,27925,45574,45575</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/29263778$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><contributor>Johnson, Louise</contributor><creatorcontrib>Wiberg, R. Axel W.</creatorcontrib><creatorcontrib>Gaggiotti, Oscar E.</creatorcontrib><creatorcontrib>Morrissey, Michael B.</creatorcontrib><creatorcontrib>Ritchie, Michael G.</creatorcontrib><creatorcontrib>Johnson, Louise</creatorcontrib><title>Identifying consistent allele frequency differences in studies of stratified populations</title><title>Methods in ecology and evolution</title><addtitle>Methods Ecol Evol</addtitle><description>With increasing application of pooled‐sequencing approaches to population genomics robust methods are needed to accurately quantify allele frequency differences between populations. Identifying consistent differences across stratified populations can allow us to detect genomic regions under selection and that differ between populations with different histories or attributes. Current popular statistical tests are easily implemented in widely available software tools which make them simple for researchers to apply. However, there are potential problems with the way such tests are used, which means that underlying assumptions about the data are frequently violated. These problems are highlighted by simulation of simple but realistic population genetic models of neutral evolution and the performance of different tests are assessed. We present alternative tests (including Generalised Linear Models [GLMs] with quasibinomial error structure) with attractive properties for the analysis of allele frequency differences and re‐analyse a published dataset. The simulations show that common statistical tests for consistent allele frequency differences perform poorly, with high false positive rates. Applying tests that do not confound heterogeneity and main effects significantly improves inference. Variation in sequencing coverage likely produces many false positives and re‐scaling allele frequencies to counts out of a common value or an effective sample size reduces this effect. Many researchers are interested in identifying allele frequencies that vary consistently across replicates to identify loci underlying phenotypic responses to selection or natural variation in phenotypes. Popular methods that have been suggested for this task perform poorly in simulations. Overall, quasibinomial GLMs perform better and also have the attractive feature of allowing correction for multiple testing by standard procedures and are easily extended to other designs.</description><subject>allele frequency differences</subject><subject>Alleles</subject><subject>Biological evolution</subject><subject>CMH‐test</subject><subject>Computer simulation</subject><subject>Data processing</subject><subject>Evolutionary Biology</subject><subject>experimental evolution</subject><subject>Gene frequency</subject><subject>Gene loci</subject><subject>Identification methods</subject><subject>Phenotypic variations</subject><subject>pool‐seq</subject><subject>Population (statistical)</subject><subject>Population genetics</subject><subject>Population studies</subject><subject>quasibinomial GLM</subject><subject>Researchers</subject><subject>Scaling</subject><subject>selection</subject><subject>Software development tools</subject><subject>Statistical analysis</subject><subject>Statistical tests</subject><issn>2041-210X</issn><issn>2041-210X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2017</creationdate><recordtype>article</recordtype><sourceid>24P</sourceid><sourceid>WIN</sourceid><recordid>eNqFkc1LAzEQxYMoKrVnb7Lgxcu2-dpk9yKIVC1UvCh4C9tNopF0U5Ou0v_eWatFvZhLXia_PGbyEDomeERgjSnmJKcEP44ILQneQYfbyu4PfYCGKb1gWKysMOX76IBWVDApy0P0ONWmXTm7du1T1oQ2ubSCQlZ7b7zJbDSvnWmbdaadtSaCNClzbZZWnXYggwUZa3BwRmfLsOw8HMDnCO3Z2icz_NoH6OFqcn95k8_urqeXF7O84aLEOScFoUSX9bwUXGuGudVCVoJTKuZFxSVjzFAsKomLykqGgRSYsKYwhGjZsAE63_guu_nC6Aaaj7VXy-gWdVyrUDv1-6Z1z-opvKlCwh-UBAzOvgxigFnTSi1caoz3dWtClxSpZFVgzggD9PQP-hK62MJ4PUVpgXHZU-MN1cSQUjR22wzBqg9O9dGoPhr1GRy8OPk5w5b_jgkAsQHenTfr__zU7WTCNs4fFcCi3w</recordid><startdate>201712</startdate><enddate>201712</enddate><creator>Wiberg, R. Axel W.</creator><creator>Gaggiotti, Oscar E.</creator><creator>Morrissey, Michael B.</creator><creator>Ritchie, Michael G.</creator><creator>Johnson, Louise</creator><general>John Wiley & Sons, Inc</general><general>John Wiley and Sons Inc</general><scope>24P</scope><scope>WIN</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QG</scope><scope>7SN</scope><scope>8FD</scope><scope>C1K</scope><scope>FR3</scope><scope>P64</scope><scope>RC3</scope><scope>7X8</scope><scope>5PM</scope><orcidid>https://orcid.org/0000-0002-8074-8670</orcidid></search><sort><creationdate>201712</creationdate><title>Identifying consistent allele frequency differences in studies of stratified populations</title><author>Wiberg, R. Axel W. ; Gaggiotti, Oscar E. ; Morrissey, Michael B. ; Ritchie, Michael G. ; Johnson, Louise</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c4680-415121d8ab864dd304fd67964226b5947333e20697059f7308ab6013c5e11d7c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2017</creationdate><topic>allele frequency differences</topic><topic>Alleles</topic><topic>Biological evolution</topic><topic>CMH‐test</topic><topic>Computer simulation</topic><topic>Data processing</topic><topic>Evolutionary Biology</topic><topic>experimental evolution</topic><topic>Gene frequency</topic><topic>Gene loci</topic><topic>Identification methods</topic><topic>Phenotypic variations</topic><topic>pool‐seq</topic><topic>Population (statistical)</topic><topic>Population genetics</topic><topic>Population studies</topic><topic>quasibinomial GLM</topic><topic>Researchers</topic><topic>Scaling</topic><topic>selection</topic><topic>Software development tools</topic><topic>Statistical analysis</topic><topic>Statistical tests</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Wiberg, R. Axel W.</creatorcontrib><creatorcontrib>Gaggiotti, Oscar E.</creatorcontrib><creatorcontrib>Morrissey, Michael B.</creatorcontrib><creatorcontrib>Ritchie, Michael G.</creatorcontrib><creatorcontrib>Johnson, Louise</creatorcontrib><collection>Wiley-Blackwell Open Access Titles</collection><collection>Wiley Free Content</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Animal Behavior Abstracts</collection><collection>Ecology Abstracts</collection><collection>Technology Research Database</collection><collection>Environmental Sciences and Pollution Management</collection><collection>Engineering Research Database</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Methods in ecology and evolution</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Wiberg, R. Axel W.</au><au>Gaggiotti, Oscar E.</au><au>Morrissey, Michael B.</au><au>Ritchie, Michael G.</au><au>Johnson, Louise</au><au>Johnson, Louise</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Identifying consistent allele frequency differences in studies of stratified populations</atitle><jtitle>Methods in ecology and evolution</jtitle><addtitle>Methods Ecol Evol</addtitle><date>2017-12</date><risdate>2017</risdate><volume>8</volume><issue>12</issue><spage>1899</spage><epage>1909</epage><pages>1899-1909</pages><issn>2041-210X</issn><eissn>2041-210X</eissn><abstract>With increasing application of pooled‐sequencing approaches to population genomics robust methods are needed to accurately quantify allele frequency differences between populations. Identifying consistent differences across stratified populations can allow us to detect genomic regions under selection and that differ between populations with different histories or attributes. Current popular statistical tests are easily implemented in widely available software tools which make them simple for researchers to apply. However, there are potential problems with the way such tests are used, which means that underlying assumptions about the data are frequently violated. These problems are highlighted by simulation of simple but realistic population genetic models of neutral evolution and the performance of different tests are assessed. We present alternative tests (including Generalised Linear Models [GLMs] with quasibinomial error structure) with attractive properties for the analysis of allele frequency differences and re‐analyse a published dataset. The simulations show that common statistical tests for consistent allele frequency differences perform poorly, with high false positive rates. Applying tests that do not confound heterogeneity and main effects significantly improves inference. Variation in sequencing coverage likely produces many false positives and re‐scaling allele frequencies to counts out of a common value or an effective sample size reduces this effect. Many researchers are interested in identifying allele frequencies that vary consistently across replicates to identify loci underlying phenotypic responses to selection or natural variation in phenotypes. Popular methods that have been suggested for this task perform poorly in simulations. Overall, quasibinomial GLMs perform better and also have the attractive feature of allowing correction for multiple testing by standard procedures and are easily extended to other designs.</abstract><cop>United States</cop><pub>John Wiley & Sons, Inc</pub><pmid>29263778</pmid><doi>10.1111/2041-210X.12810</doi><tpages>11</tpages><orcidid>https://orcid.org/0000-0002-8074-8670</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 2041-210X
ispartof	Methods in ecology and evolution, 2017-12, Vol.8 (12), p.1899-1909
issn	2041-210X 2041-210X
language	eng
recordid	cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_5726381
source	Wiley Journals; Alma/SFX Local Collection
subjects	allele frequency differences Alleles Biological evolution CMH‐test Computer simulation Data processing Evolutionary Biology experimental evolution Gene frequency Gene loci Identification methods Phenotypic variations pool‐seq Population (statistical) Population genetics Population studies quasibinomial GLM Researchers Scaling selection Software development tools Statistical analysis Statistical tests
title	Identifying consistent allele frequency differences in studies of stratified populations
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-05T10%3A46%3A37IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Identifying%20consistent%20allele%20frequency%20differences%20in%20studies%20of%20stratified%20populations&rft.jtitle=Methods%20in%20ecology%20and%20evolution&rft.au=Wiberg,%20R.%20Axel%20W.&rft.date=2017-12&rft.volume=8&rft.issue=12&rft.spage=1899&rft.epage=1909&rft.pages=1899-1909&rft.issn=2041-210X&rft.eissn=2041-210X&rft_id=info:doi/10.1111/2041-210X.12810&rft_dat=%3Cproquest_pubme%3E1979504313%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1972250083&rft_id=info:pmid/29263778&rfr_iscdi=true