Simulated Ancient Genomic Kinship Dataset: VCF and BAM (1x) Files for Related (including inbred) Pairs
Simulated Ancient Genomic Kinship Dataset: VCF and BAM (1x and 5x) Files for Related (including inbred) Pairs Description: This dataset comprises simulated pedigrees (VCF files containing 8,677,101 autosomal biallelic and 298,625 X chromosomal SNP positions) generated using Ped-sim (v1.3) and compri...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Dataset |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Simulated Ancient Genomic Kinship Dataset: VCF and BAM (1x and 5x) Files for Related (including inbred) Pairs
Description:
This dataset comprises simulated pedigrees (VCF files containing 8,677,101 autosomal biallelic and 298,625 X chromosomal SNP positions) generated using Ped-sim (v1.3) and comprising pairs of diverse familial relationship types up to third-degree. The first-degree relationships are parent-offspring and siblings; the second-degree relationships are half-siblings, grandparent-grandchild, and avuncular pairs; and third-degree relationships are first cousins, great-grandparent-great-grandchild, and grand avuncular pairs. For each of these 8 relationship types, our dataset includes 48 pairs of individuals. It also contains unrelated pairs. Additionally, the dataset includes first- and second-degree relatives, with inbreeding (parent-offspring pairs where the parents of the offspring are the first cousins and grandparent-grandchild pairs where the grandchild is the offspring of first cousins). Our simulations encompass all combinations of kinship types regarding sex. The dataset was further enriched by simulating ancient DNA-like sequencing data (5x and 1x BAM files) of Ped-sim simulated individuals using the gargammel tool, employing procedures akin to standard paleogenomic sequencing libraries. Note that the BAM files contain only randomly chosen 200K autosomal SNP positions. Positions can be found in the "200K_positions" file. Details can be found in Aktürk, Mapelli and Güler et al. 2023.
Data Sources and Generation:
Founder genotypes for pedigree simulation were created from the Tuscany (TSI) population SNPs within the 1000 Genomes Dataset v3. Notably, the founder genotypes lack background relatedness or runs of homozygosity (ROH).
Description of File Naming Conventions:
The naming conventions of the BAM files in this dataset are designed to convey key information regarding the specifics of each file.
cov1x or cov5x: This segment denotes the coverage level of the BAM files, indicating whether the sequencing coverage for the individuals in the files is 1x or 5x.
run_*: Signifies the particular batch from which the pedigree and individuals are derived. This name segment also applies to VCF files.
parent-offspring_* or similar identifiers: Reflects the origin of the individual from the corresponding VCF file. For instance, "parent-offspring_1" corresponds to the individuals present in the "run_*_parent-offspring_1.vcf" file.
parent-offspring |
---|---|
DOI: | 10.5281/zenodo.10070957 |