A parsimony estimator of the number of populations from a STRUCTURE‐like analysis
Population genetics model based Bayesian methods have been proposed and widely applied to making unsupervised inference of population structure from a sample of multilocus genotypes. Usually they provide good estimates of the ancestry (or population membership) of sampled individuals by clustering t...
Gespeichert in:
Veröffentlicht in: | Molecular ecology resources 2019-07, Vol.19 (4), p.970-981 |
---|---|
1. Verfasser: | |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Population genetics model based Bayesian methods have been proposed and widely applied to making unsupervised inference of population structure from a sample of multilocus genotypes. Usually they provide good estimates of the ancestry (or population membership) of sampled individuals by clustering them probabilistically or proportionally into (anonymous) populations. However, they have difficulties in accurately estimating the number of populations (K) represented by the sampled individuals. This study proposed a new ad hoc estimator of K, calculable from the output of a population clustering program such as STRUCTURE or ADMIXTURE. The new criterion, called parsimony index (PI), aims to identify the number of populations (K) which yields consistently the minimal admixture estimates of sampled individuals. Extensive simulated and empirical data were used to compare the accuracy of PI and two popular K estimators based on Pr[X|K] (i.e., the probability of genotype data X given K) and ΔK (i.e., the rate of change of the probability of data as a function of K) calculated from STRUCTURE outputs, and the accuracy of PI and the cross‐validation method calculated from ADMIXTURE outputs. It was shown that PI was more accurate than the other methods consistently in various population structure (e.g., hierarchical island model, different extents of differentiation) and sampling (e.g., unbalanced sample sizes, different marker information contents) scenarios. The ΔK method was more accurate than the Pr[X|K] method only for hierarchically structured or highly inbred populations, and the opposite was true in the other scenarios. The PI method was implemented in a computer program, KFinder, which can be run on all major computer platforms. |
---|---|
ISSN: | 1755-098X 1755-0998 |
DOI: | 10.1111/1755-0998.13000 |