Data from: How “simple” methodological decisions affect interpretation of population structure based on reduced representation library DNA sequencing: a case study using the lake whitefish
Reduced representation (RRL) sequencing approaches (e.g., RADSeq, genotyping by sequencing) require decisions about how much to invest in genome coverage and sequencing depth (library quality), as well as choices of values for adjustable bioinformatics parameters. To empirically explore the importan...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Dataset |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Reduced representation (RRL) sequencing approaches (e.g., RADSeq,
genotyping by sequencing) require decisions about how much to invest in
genome coverage and sequencing depth (library quality), as well as choices
of values for adjustable bioinformatics parameters. To empirically explore
the importance of these “simple” decisions, we generated two independent
sequencing libraries for the same 142 individual lake whitefish (Coregonus
clupeaformis) using a nextRAD RRL approach: (1) A small number of loci and
low sequencing depth (library A); and (2) more loci and higher sequencing
depth (library B). The fish were selected from populations with different
levels of expected genetic subdivision. Each library was analyzed using
the STACKS pipeline followed by three types of population structure
assessment (FST, DAPC and ADMIXTURE) with iterative increases in the
stringency of sequencing depth and missing data requirements, as well as
more specific a priori population maps. Library B was always able to
resolve strong population differentiation in all three types of assessment
regardless of the selected parameters. In contrast, library A produced
more variable results; increasing the minimum sequencing depth threshold
(-m) resulted in a reduced number of retained loci, and therefore lost
resolution at high -m values for FST and ADMIXTURE, but not DAPC. FST and
DAPC were robust to varying the population map and increasing the
stringency of missing data requirements. In contrast, ADMIXTURE was unable
to resolve strong population differentiation when increasing these same
parameters in library A. Similarly, when examining fine scale population
subdivision, library B was robust to changing parameters but library A
lost resolution depending on the parameter set. We used library B to
examine actual subdivision in our study populations. All three types of
analysis found complete subdivision among populations in Lake Huron, ON
and Dore Lake, SK, Canada using 10,640 SNP loci. Weak population
subdivision was detected in Lake Huron with fish from sites in the
north-west, Search Bay, North Point and Hammond Bay, showing slight
differentiation. Overall, we show that apparently simple decisions about
library quality and bioinformatics parameters can have potentially
important impacts on the interpretation of population subdivision.
Although costly, the early investment in a high-quality library and more
conservative stringency settings on STACKS parameters lead to a final
datase |
---|---|
DOI: | 10.5061/dryad.4vr8kp3 |