Simulated Ancient Genomic Kinship Dataset: VCF and BAM (1x) Files for Related (including inbred) Pairs

Simulated Ancient Genomic Kinship Dataset: VCF and BAM (1x and 5x) Files for Related (including inbred) Pairs Description: This dataset comprises simulated pedigrees (VCF files containing 8,677,101 autosomal biallelic and 298,625 X chromosomal SNP positions) generated using Ped-sim (v1.3) and compri...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Aktürk, Şevval, Mapelli, Igor, Güler, Merve N., Somel, Mehmet
Format: Dataset
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Aktürk, Şevval
Mapelli, Igor
Güler, Merve N.
Somel, Mehmet
description Simulated Ancient Genomic Kinship Dataset: VCF and BAM (1x and 5x) Files for Related (including inbred) Pairs Description: This dataset comprises simulated pedigrees (VCF files containing 8,677,101 autosomal biallelic and 298,625 X chromosomal SNP positions) generated using Ped-sim (v1.3) and comprising pairs of diverse familial relationship types up to third-degree. The first-degree relationships are parent-offspring and siblings; the second-degree relationships are half-siblings, grandparent-grandchild, and avuncular pairs; and third-degree relationships are first cousins, great-grandparent-great-grandchild, and grand avuncular pairs. For each of these 8 relationship types, our dataset includes 48 pairs of individuals. It also contains unrelated pairs. Additionally, the dataset includes first- and second-degree relatives, with inbreeding (parent-offspring pairs where the parents of the offspring are the first cousins and grandparent-grandchild pairs where the grandchild is the offspring of first cousins). Our simulations encompass all combinations of kinship types regarding sex. The dataset was further enriched by simulating ancient DNA-like sequencing data (5x and 1x BAM files) of Ped-sim simulated individuals using the gargammel tool, employing procedures akin to standard paleogenomic sequencing libraries. Note that the BAM files contain only randomly chosen 200K autosomal SNP positions. Positions can be found in the "200K_positions" file. Details can be found in Aktürk, Mapelli and Güler et al. 2023. Data Sources and Generation: Founder genotypes for pedigree simulation were created from the Tuscany (TSI) population SNPs within the 1000 Genomes Dataset v3. Notably, the founder genotypes lack background relatedness or runs of homozygosity (ROH). Description of File Naming Conventions: The naming conventions of the BAM files in this dataset are designed to convey key information regarding the specifics of each file. cov1x or cov5x: This segment denotes the coverage level of the BAM files, indicating whether the sequencing coverage for the individuals in the files is 1x or 5x. run_*: Signifies the particular batch from which the pedigree and individuals are derived. This name segment also applies to VCF files. parent-offspring_* or similar identifiers: Reflects the origin of the individual from the corresponding VCF file. For instance, "parent-offspring_1" corresponds to the individuals present in the "run_*_parent-offspring_1.vcf" file. parent-offspring
doi_str_mv 10.5281/zenodo.10070957
format Dataset
fullrecord <record><control><sourceid>datacite_PQ8</sourceid><recordid>TN_cdi_datacite_primary_10_5281_zenodo_10070957</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_5281_zenodo_10070957</sourcerecordid><originalsourceid>FETCH-datacite_primary_10_5281_zenodo_100709573</originalsourceid><addsrcrecordid>eNqVjksPATEURruxEKxt73IsmJaIx85rSEQiiG1T0zvcpNOZtJXg1yP4AVbf5nwnh7Gm4J1-dyjiB9pCFx3B-YCP-oMqy_aUX40KqGFiU0IbYPlickphTdZfqIS5CspjGMNxloCyGqaTDUTi1oKEDHrICgc7_Dgisqm5arJnIHtyqFuwVeR8nVUyZTw2vltjcbI4zFZt_ZKnFFCWjnLl7lJw-S6Vn1L5K-39_3gCyCJMwA</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>dataset</recordtype></control><display><type>dataset</type><title>Simulated Ancient Genomic Kinship Dataset: VCF and BAM (1x) Files for Related (including inbred) Pairs</title><source>DataCite</source><creator>Aktürk, Şevval ; Mapelli, Igor ; Güler, Merve N. ; Somel, Mehmet</creator><creatorcontrib>Aktürk, Şevval ; Mapelli, Igor ; Güler, Merve N. ; Somel, Mehmet</creatorcontrib><description>Simulated Ancient Genomic Kinship Dataset: VCF and BAM (1x and 5x) Files for Related (including inbred) Pairs Description: This dataset comprises simulated pedigrees (VCF files containing 8,677,101 autosomal biallelic and 298,625 X chromosomal SNP positions) generated using Ped-sim (v1.3) and comprising pairs of diverse familial relationship types up to third-degree. The first-degree relationships are parent-offspring and siblings; the second-degree relationships are half-siblings, grandparent-grandchild, and avuncular pairs; and third-degree relationships are first cousins, great-grandparent-great-grandchild, and grand avuncular pairs. For each of these 8 relationship types, our dataset includes 48 pairs of individuals. It also contains unrelated pairs. Additionally, the dataset includes first- and second-degree relatives, with inbreeding (parent-offspring pairs where the parents of the offspring are the first cousins and grandparent-grandchild pairs where the grandchild is the offspring of first cousins). Our simulations encompass all combinations of kinship types regarding sex. The dataset was further enriched by simulating ancient DNA-like sequencing data (5x and 1x BAM files) of Ped-sim simulated individuals using the gargammel tool, employing procedures akin to standard paleogenomic sequencing libraries. Note that the BAM files contain only randomly chosen 200K autosomal SNP positions. Positions can be found in the "200K_positions" file. Details can be found in Aktürk, Mapelli and Güler et al. 2023. Data Sources and Generation: Founder genotypes for pedigree simulation were created from the Tuscany (TSI) population SNPs within the 1000 Genomes Dataset v3. Notably, the founder genotypes lack background relatedness or runs of homozygosity (ROH). Description of File Naming Conventions: The naming conventions of the BAM files in this dataset are designed to convey key information regarding the specifics of each file. cov1x or cov5x: This segment denotes the coverage level of the BAM files, indicating whether the sequencing coverage for the individuals in the files is 1x or 5x. run_*: Signifies the particular batch from which the pedigree and individuals are derived. This name segment also applies to VCF files. parent-offspring_* or similar identifiers: Reflects the origin of the individual from the corresponding VCF file. For instance, "parent-offspring_1" corresponds to the individuals present in the "run_*_parent-offspring_1.vcf" file. parent-offspring* or similar identifiers:  Indicates the origin of the individual from the sets within the VCF files. For example, "parent-offspring1" signifies the first set of parent-offspring pedigrees within the VCF file. Note that parent-offspring, grandparent-grandchild, and great-grandparent-great-grandchild and the inbreeding VCFs contain only one set, so this identifier is always 1. This convention can be 1 or 2 for the rest of the pedigrees, as the VCF files contain two sets of related pairs. _g*-b*-: Provides information about the individual's generational level within the VCF. This follows the Ped-sim syntax. For example, for parent-offspring type, "_g1-b1-" indicates the first parent (generation 1) within a specific pedigree, and "_g1-b2-" indicates the second parent (generation 1) while "_g2-b1-" represents the offspring (generation 2). Example Naming Structure: For instance, the file "cov1x_run1_parent-offspring_1_parent-offspring1_g1-b1-i1.all.hs37d5.cons.90perc.trimBAM.bam" signifies a BAM file with 1x coverage, originating from "run1," containing individuals from the "run_*_parent-offspring_1.vcf" file (first set of parent-offspring pairs) where "_g1-b1-" designates the first parent in the first generation. The latter half of the name "hs37d5.cons.90perc.trimBAM.bam" is the same across all files.   Note1: Segments such as parent-offspring*_g*-b*- can also be tracked in the naming of the genotype columns in the VCF. Note2: Sexual information within the VCF files is discernible from the genetic data present at X chromosome positions. Individuals carrying two genotypes on the X chromosome are female, while those with a single genotype are male. Note3: Some of the individuals from distinct pedigrees may, in fact, be related due to shared ancestry through common founders. To suit specific research objectives, researchers may need to identify and exclude such relatives if the full dataset is used for kinship estimation. For more details about the dataset's generation process, unique characteristics, or any specific inquiries, our team is available for further information. We welcome and encourage inquiries, aiming to provide comprehensive support and additional details that might aid researchers in utilizing this dataset effectively. Please don't hesitate to contact us for any specific information you may need. This repository contains only VCFs and cov1x BAM and 200K_positions files. The rest of the files can be found at 10.5281/zenodo.10079625 and 10.5281/zenodo.10079685.  </description><identifier>DOI: 10.5281/zenodo.10070957</identifier><language>eng</language><publisher>Zenodo</publisher><subject>ancient dna ; first-degree ; gargammel ; genome ; kinship ; low-coverage ; Ped-sim ; pedigree ; relatedness ; seconde-degree ; simulation ; third-degree</subject><creationdate>2023</creationdate><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><orcidid>0000-0002-3138-1307 ; 0000-0003-4157-6551 ; 0000-0001-7766-9333</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>780,1892</link.rule.ids><linktorsrc>$$Uhttps://commons.datacite.org/doi.org/10.5281/zenodo.10070957$$EView_record_in_DataCite.org$$FView_record_in_$$GDataCite.org$$Hfree_for_read</linktorsrc></links><search><creatorcontrib>Aktürk, Şevval</creatorcontrib><creatorcontrib>Mapelli, Igor</creatorcontrib><creatorcontrib>Güler, Merve N.</creatorcontrib><creatorcontrib>Somel, Mehmet</creatorcontrib><title>Simulated Ancient Genomic Kinship Dataset: VCF and BAM (1x) Files for Related (including inbred) Pairs</title><description>Simulated Ancient Genomic Kinship Dataset: VCF and BAM (1x and 5x) Files for Related (including inbred) Pairs Description: This dataset comprises simulated pedigrees (VCF files containing 8,677,101 autosomal biallelic and 298,625 X chromosomal SNP positions) generated using Ped-sim (v1.3) and comprising pairs of diverse familial relationship types up to third-degree. The first-degree relationships are parent-offspring and siblings; the second-degree relationships are half-siblings, grandparent-grandchild, and avuncular pairs; and third-degree relationships are first cousins, great-grandparent-great-grandchild, and grand avuncular pairs. For each of these 8 relationship types, our dataset includes 48 pairs of individuals. It also contains unrelated pairs. Additionally, the dataset includes first- and second-degree relatives, with inbreeding (parent-offspring pairs where the parents of the offspring are the first cousins and grandparent-grandchild pairs where the grandchild is the offspring of first cousins). Our simulations encompass all combinations of kinship types regarding sex. The dataset was further enriched by simulating ancient DNA-like sequencing data (5x and 1x BAM files) of Ped-sim simulated individuals using the gargammel tool, employing procedures akin to standard paleogenomic sequencing libraries. Note that the BAM files contain only randomly chosen 200K autosomal SNP positions. Positions can be found in the "200K_positions" file. Details can be found in Aktürk, Mapelli and Güler et al. 2023. Data Sources and Generation: Founder genotypes for pedigree simulation were created from the Tuscany (TSI) population SNPs within the 1000 Genomes Dataset v3. Notably, the founder genotypes lack background relatedness or runs of homozygosity (ROH). Description of File Naming Conventions: The naming conventions of the BAM files in this dataset are designed to convey key information regarding the specifics of each file. cov1x or cov5x: This segment denotes the coverage level of the BAM files, indicating whether the sequencing coverage for the individuals in the files is 1x or 5x. run_*: Signifies the particular batch from which the pedigree and individuals are derived. This name segment also applies to VCF files. parent-offspring_* or similar identifiers: Reflects the origin of the individual from the corresponding VCF file. For instance, "parent-offspring_1" corresponds to the individuals present in the "run_*_parent-offspring_1.vcf" file. parent-offspring* or similar identifiers:  Indicates the origin of the individual from the sets within the VCF files. For example, "parent-offspring1" signifies the first set of parent-offspring pedigrees within the VCF file. Note that parent-offspring, grandparent-grandchild, and great-grandparent-great-grandchild and the inbreeding VCFs contain only one set, so this identifier is always 1. This convention can be 1 or 2 for the rest of the pedigrees, as the VCF files contain two sets of related pairs. _g*-b*-: Provides information about the individual's generational level within the VCF. This follows the Ped-sim syntax. For example, for parent-offspring type, "_g1-b1-" indicates the first parent (generation 1) within a specific pedigree, and "_g1-b2-" indicates the second parent (generation 1) while "_g2-b1-" represents the offspring (generation 2). Example Naming Structure: For instance, the file "cov1x_run1_parent-offspring_1_parent-offspring1_g1-b1-i1.all.hs37d5.cons.90perc.trimBAM.bam" signifies a BAM file with 1x coverage, originating from "run1," containing individuals from the "run_*_parent-offspring_1.vcf" file (first set of parent-offspring pairs) where "_g1-b1-" designates the first parent in the first generation. The latter half of the name "hs37d5.cons.90perc.trimBAM.bam" is the same across all files.   Note1: Segments such as parent-offspring*_g*-b*- can also be tracked in the naming of the genotype columns in the VCF. Note2: Sexual information within the VCF files is discernible from the genetic data present at X chromosome positions. Individuals carrying two genotypes on the X chromosome are female, while those with a single genotype are male. Note3: Some of the individuals from distinct pedigrees may, in fact, be related due to shared ancestry through common founders. To suit specific research objectives, researchers may need to identify and exclude such relatives if the full dataset is used for kinship estimation. For more details about the dataset's generation process, unique characteristics, or any specific inquiries, our team is available for further information. We welcome and encourage inquiries, aiming to provide comprehensive support and additional details that might aid researchers in utilizing this dataset effectively. Please don't hesitate to contact us for any specific information you may need. This repository contains only VCFs and cov1x BAM and 200K_positions files. The rest of the files can be found at 10.5281/zenodo.10079625 and 10.5281/zenodo.10079685.  </description><subject>ancient dna</subject><subject>first-degree</subject><subject>gargammel</subject><subject>genome</subject><subject>kinship</subject><subject>low-coverage</subject><subject>Ped-sim</subject><subject>pedigree</subject><subject>relatedness</subject><subject>seconde-degree</subject><subject>simulation</subject><subject>third-degree</subject><fulltext>true</fulltext><rsrctype>dataset</rsrctype><creationdate>2023</creationdate><recordtype>dataset</recordtype><sourceid>PQ8</sourceid><recordid>eNqVjksPATEURruxEKxt73IsmJaIx85rSEQiiG1T0zvcpNOZtJXg1yP4AVbf5nwnh7Gm4J1-dyjiB9pCFx3B-YCP-oMqy_aUX40KqGFiU0IbYPlickphTdZfqIS5CspjGMNxloCyGqaTDUTi1oKEDHrICgc7_Dgisqm5arJnIHtyqFuwVeR8nVUyZTw2vltjcbI4zFZt_ZKnFFCWjnLl7lJw-S6Vn1L5K-39_3gCyCJMwA</recordid><startdate>20231109</startdate><enddate>20231109</enddate><creator>Aktürk, Şevval</creator><creator>Mapelli, Igor</creator><creator>Güler, Merve N.</creator><creator>Somel, Mehmet</creator><general>Zenodo</general><scope>DYCCY</scope><scope>PQ8</scope><orcidid>https://orcid.org/0000-0002-3138-1307</orcidid><orcidid>https://orcid.org/0000-0003-4157-6551</orcidid><orcidid>https://orcid.org/0000-0001-7766-9333</orcidid></search><sort><creationdate>20231109</creationdate><title>Simulated Ancient Genomic Kinship Dataset: VCF and BAM (1x) Files for Related (including inbred) Pairs</title><author>Aktürk, Şevval ; Mapelli, Igor ; Güler, Merve N. ; Somel, Mehmet</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-datacite_primary_10_5281_zenodo_100709573</frbrgroupid><rsrctype>datasets</rsrctype><prefilter>datasets</prefilter><language>eng</language><creationdate>2023</creationdate><topic>ancient dna</topic><topic>first-degree</topic><topic>gargammel</topic><topic>genome</topic><topic>kinship</topic><topic>low-coverage</topic><topic>Ped-sim</topic><topic>pedigree</topic><topic>relatedness</topic><topic>seconde-degree</topic><topic>simulation</topic><topic>third-degree</topic><toplevel>online_resources</toplevel><creatorcontrib>Aktürk, Şevval</creatorcontrib><creatorcontrib>Mapelli, Igor</creatorcontrib><creatorcontrib>Güler, Merve N.</creatorcontrib><creatorcontrib>Somel, Mehmet</creatorcontrib><collection>DataCite (Open Access)</collection><collection>DataCite</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Aktürk, Şevval</au><au>Mapelli, Igor</au><au>Güler, Merve N.</au><au>Somel, Mehmet</au><format>book</format><genre>unknown</genre><ristype>DATA</ristype><title>Simulated Ancient Genomic Kinship Dataset: VCF and BAM (1x) Files for Related (including inbred) Pairs</title><date>2023-11-09</date><risdate>2023</risdate><abstract>Simulated Ancient Genomic Kinship Dataset: VCF and BAM (1x and 5x) Files for Related (including inbred) Pairs Description: This dataset comprises simulated pedigrees (VCF files containing 8,677,101 autosomal biallelic and 298,625 X chromosomal SNP positions) generated using Ped-sim (v1.3) and comprising pairs of diverse familial relationship types up to third-degree. The first-degree relationships are parent-offspring and siblings; the second-degree relationships are half-siblings, grandparent-grandchild, and avuncular pairs; and third-degree relationships are first cousins, great-grandparent-great-grandchild, and grand avuncular pairs. For each of these 8 relationship types, our dataset includes 48 pairs of individuals. It also contains unrelated pairs. Additionally, the dataset includes first- and second-degree relatives, with inbreeding (parent-offspring pairs where the parents of the offspring are the first cousins and grandparent-grandchild pairs where the grandchild is the offspring of first cousins). Our simulations encompass all combinations of kinship types regarding sex. The dataset was further enriched by simulating ancient DNA-like sequencing data (5x and 1x BAM files) of Ped-sim simulated individuals using the gargammel tool, employing procedures akin to standard paleogenomic sequencing libraries. Note that the BAM files contain only randomly chosen 200K autosomal SNP positions. Positions can be found in the "200K_positions" file. Details can be found in Aktürk, Mapelli and Güler et al. 2023. Data Sources and Generation: Founder genotypes for pedigree simulation were created from the Tuscany (TSI) population SNPs within the 1000 Genomes Dataset v3. Notably, the founder genotypes lack background relatedness or runs of homozygosity (ROH). Description of File Naming Conventions: The naming conventions of the BAM files in this dataset are designed to convey key information regarding the specifics of each file. cov1x or cov5x: This segment denotes the coverage level of the BAM files, indicating whether the sequencing coverage for the individuals in the files is 1x or 5x. run_*: Signifies the particular batch from which the pedigree and individuals are derived. This name segment also applies to VCF files. parent-offspring_* or similar identifiers: Reflects the origin of the individual from the corresponding VCF file. For instance, "parent-offspring_1" corresponds to the individuals present in the "run_*_parent-offspring_1.vcf" file. parent-offspring* or similar identifiers:  Indicates the origin of the individual from the sets within the VCF files. For example, "parent-offspring1" signifies the first set of parent-offspring pedigrees within the VCF file. Note that parent-offspring, grandparent-grandchild, and great-grandparent-great-grandchild and the inbreeding VCFs contain only one set, so this identifier is always 1. This convention can be 1 or 2 for the rest of the pedigrees, as the VCF files contain two sets of related pairs. _g*-b*-: Provides information about the individual's generational level within the VCF. This follows the Ped-sim syntax. For example, for parent-offspring type, "_g1-b1-" indicates the first parent (generation 1) within a specific pedigree, and "_g1-b2-" indicates the second parent (generation 1) while "_g2-b1-" represents the offspring (generation 2). Example Naming Structure: For instance, the file "cov1x_run1_parent-offspring_1_parent-offspring1_g1-b1-i1.all.hs37d5.cons.90perc.trimBAM.bam" signifies a BAM file with 1x coverage, originating from "run1," containing individuals from the "run_*_parent-offspring_1.vcf" file (first set of parent-offspring pairs) where "_g1-b1-" designates the first parent in the first generation. The latter half of the name "hs37d5.cons.90perc.trimBAM.bam" is the same across all files.   Note1: Segments such as parent-offspring*_g*-b*- can also be tracked in the naming of the genotype columns in the VCF. Note2: Sexual information within the VCF files is discernible from the genetic data present at X chromosome positions. Individuals carrying two genotypes on the X chromosome are female, while those with a single genotype are male. Note3: Some of the individuals from distinct pedigrees may, in fact, be related due to shared ancestry through common founders. To suit specific research objectives, researchers may need to identify and exclude such relatives if the full dataset is used for kinship estimation. For more details about the dataset's generation process, unique characteristics, or any specific inquiries, our team is available for further information. We welcome and encourage inquiries, aiming to provide comprehensive support and additional details that might aid researchers in utilizing this dataset effectively. Please don't hesitate to contact us for any specific information you may need. This repository contains only VCFs and cov1x BAM and 200K_positions files. The rest of the files can be found at 10.5281/zenodo.10079625 and 10.5281/zenodo.10079685.  </abstract><pub>Zenodo</pub><doi>10.5281/zenodo.10070957</doi><orcidid>https://orcid.org/0000-0002-3138-1307</orcidid><orcidid>https://orcid.org/0000-0003-4157-6551</orcidid><orcidid>https://orcid.org/0000-0001-7766-9333</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.5281/zenodo.10070957
ispartof
issn
language eng
recordid cdi_datacite_primary_10_5281_zenodo_10070957
source DataCite
subjects ancient dna
first-degree
gargammel
genome
kinship
low-coverage
Ped-sim
pedigree
relatedness
seconde-degree
simulation
third-degree
title Simulated Ancient Genomic Kinship Dataset: VCF and BAM (1x) Files for Related (including inbred) Pairs
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-12T06%3A32%3A39IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-datacite_PQ8&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=unknown&rft.au=Akt%C3%BCrk,%20%C5%9Eevval&rft.date=2023-11-09&rft_id=info:doi/10.5281/zenodo.10070957&rft_dat=%3Cdatacite_PQ8%3E10_5281_zenodo_10070957%3C/datacite_PQ8%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true