Anous stolidus SNPs from ultraconserved elements (UCEs) in the Southwestern Atlantic Ocean
This dataset contains 2062 single nucleotide polymorphisms (SNPs) obtained from ultraconserved elements of the brown noddy Anous stolidus in the vcf format. Individuals (n = 67) belong to six colonies in the Southwestern Atlantic Ocean: Fernando de Noronha, Abrolhos, São Pedro e São Paulo archipelag...
Gespeichert in:
1. Verfasser: | |
---|---|
Format: | Dataset |
Sprache: | eng |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | This dataset contains 2062 single nucleotide polymorphisms (SNPs) obtained from ultraconserved elements of the brown noddy Anous stolidus in the vcf format. Individuals (n = 67) belong to six colonies in the Southwestern Atlantic Ocean: Fernando de Noronha, Abrolhos, São Pedro e São Paulo archipelagos; Rocas Atoll; Trindade and Martin Vaz islands.
We extracted DNA from blood samples using the DNeasy Blood & Tissue Kit (QIAGEN) following the manufacturer’s protocol. DNA samples were quantified with a Qubit Invitrogen fluorometer. Samples were sequenced by Rapid Genomics LLC using the tetrapod2,5k probe and 2 million reads per sample. We used illumiprocessor (Faircloth 2013) with the package trimmomatic (Bolger et al. 2014) to process reads from Illumina and performed the assembly of contigs, identification of UCEs and alignment of sequences with phyluce (Faircloth 2015). We then ran BLAST+ (Camacho et al. 2009) with the probe set against the zebra finch Taeniopygia guttata total mitochondrial genome (NCBI; ref: NC_007897.1) and Z chromosome (NCBI; ref: NC_044241.2) in order to remove non-autosomal UCE loci. Then, the longest sequence was used as a reference in the Genome Analysis Toolkit 4.3.0 (Van der Auwera and O'Connor 2020) to call single nucleotide polymorphisms (SNPs) for each individual. We used the packages BWA (Li and Durbin 2009) and SAMtools 1.9 (Danecek et al. 2021) to index the reference sequence, output and sort bam files for each individual through GNU parallel (Tange 2018). Duplicates were removed with MarkDuplicatesSpark and performed haplotype calling with HaplotypeCaller 4.3.0 (Poplin et al. 2017). We filtered genotypes with vcftools (Danecek et al. 2011) setting the minimum Q, DP and GQ values to 30. Base quality score recalibration was performed with BaseRecalibrator and ApplyBQSR. We performed hard filtering with VariantFiltration. Finally, we removed multiallelic loci and, to avoid retaining linked variants in the final dataset, we selected the first polymorphism detected in each locus with vcftools. |
---|---|
DOI: | 10.5281/zenodo.10524759 |