Limitations of the human reference genome for personalized genomics

Data from the 1000 genomes project (1KGP) and Complete Genomics (CG) have dramatically increased the numbers of known genetic variants and challenge several assumptions about the reference genome and its uses in both clinical and research settings. Specifically, 34% of published array-based GWAS stu...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	PloS one 2012-07, Vol.7 (7), p.e40294
Hauptverfasser:	Rosenfeld, Jeffrey A, Mason, Christopher E, Smith, Todd M
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Arrays Bioinformatics Biology Chromosomes Comparative analysis Data collection DNA probes Genes Genetic aspects Genetic diversity Genetic variance Genetic Variation Genetics Genome, Human Genome-Wide Association Study Genomes Genomics HapMap Project Health risk assessment Human populations Humans INDEL Mutation Informatics Linkage Disequilibrium Medical research Medical treatment Medicine Polymorphism, Single Nucleotide Single nucleotide polymorphisms Single-nucleotide polymorphism
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Data from the 1000 genomes project (1KGP) and Complete Genomics (CG) have dramatically increased the numbers of known genetic variants and challenge several assumptions about the reference genome and its uses in both clinical and research settings. Specifically, 34% of published array-based GWAS studies for a variety of diseases utilize probes that overlap unanticipated single nucleotide polymorphisms (SNPs), indels, or structural variants. Linkage disequilibrium (LD) block length depends on the numbers of markers used, and the mean LD block size decreases from 16 kb to 7 kb,when HapMap-based calculations are compared to blocks computed from1KGP data. Additionally, when 1KGP and CG variants are compared, 19% of the single nucleotide variants (SNVs) reported from common genomes are unique to one dataset; likely a result of differences in data collection methodology, alignment of reads to the reference genome, and variant-calling algorithms. Together these observations indicate that current research resources and informatics methods do not adequately account for the high level of variation that already exists in the human population and significant efforts are needed to create resources that can accurately assess personal genomics for health, disease, and predict treatment outcomes.
ISSN:	1932-6203 1932-6203
DOI:	10.1371/journal.pone.0040294