A compilation of tri-allelic SNPs from 1000 Genomes and use of the most polymorphic loci for a large-scale human identification panel
•271,934 tri-allelic SNPs were identified in the 1000 Genomes Phase III variant catalog and data has been compiled in Mendeley Data for free access.•From this extensive dataset 8,705 SNPs had heterozygosity values above 0.5 - the maximum value of perfect binary SNPs (0.5:0.5 allele frequencies).•A l...
Gespeichert in:
Veröffentlicht in: | Forensic science international : genetics 2020-05, Vol.46, p.102232, Article 102232 |
---|---|
Hauptverfasser: | , , , , , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | •271,934 tri-allelic SNPs were identified in the 1000 Genomes Phase III variant catalog and data has been compiled in Mendeley Data for free access.•From this extensive dataset 8,705 SNPs had heterozygosity values above 0.5 - the maximum value of perfect binary SNPs (0.5:0.5 allele frequencies).•A large-scale forensic identification multiplex was constructed for MPS, comprising 1,241 autosomal plus 29 X tri-allelic SNPs.•Approximately 5 % of tri-allelic SNPs selected for the large-scale MPS panel gave three-genotype patterns in one individual or discordant genotypes.•The need for caution and detailed scrutiny of multiple-allele variant data is highlighted when designing future forensic SNP panels.
In a directed search of 1000 Genomes Phase III variation data, 271,934 tri-allelic single nucleotide polymorphisms (SNPs) were identified amongst the genotypes of 2,504 individuals from 26 populations. The majority of tri-allelic SNPs have three nucleotide substitution-based alleles at the same position, while a much smaller proportion, which we did not compile, have a nucleotide insertion/deletion plus substitution alleles. SNPs with three alleles have higher discrimination power than binary loci but keep the same characteristic of optimum amplification of the fragmented DNA found in highly degraded forensic samples. Although most of the tri-allelic SNPs identified had one or two alleles at low frequencies, often single observations, we present a full compilation of the genome positions, rs-numbers and genotypes of all tri-allelic SNPs detected by the 1000 Genomes project from the more detailed analyses it applied to Phase III sequence data. A total of 8,705 tri-allelic SNPs had overall heterozygosities (averaged across all 1000 Genomes populations) higher than the binary SNP maximum value of 0.5. Of these, 1,637 displayed the highest average heterozygosity values of 0.6-0.666. The most informative tri-allelic SNPs we identified were used to construct a large-scale human identification panel for massively parallel sequencing, designed for the identification of missing persons. The large-scale MPS identification panel comprised: 1,241 autosomal tri-allelic SNPs and 29 X tri-allelic SNPs (plus 46 microhaplotypes adapted for genotyping from reduced length sequences). Allele frequency estimates are detailed for African, European, South Asian and East Asian population groups plus the Peruvian population sampled by 1000 Genomes for the 1,270 tri-allelic SNPs of the fi |
---|---|
ISSN: | 1872-4973 1878-0326 1878-0326 |
DOI: | 10.1016/j.fsigen.2020.102232 |