Integrating sequence and array data to create an improved 1000 Genomes Project haplotype reference panel

A major use of the 1000 Genomes Project (1000GP) data is genotype imputation in genome-wide association studies (GWAS). Here we develop a method to estimate haplotypes from low-coverage sequencing data that can take advantage of single-nucleotide polymorphism (SNP) microarray genotypes on the same s...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Nature communications 2014-06, Vol.5 (1), p.3934-3934, Article 3934
Hauptverfasser:	Delaneau, Olivier, Marchini, Jonathan
Format:	Artikel
Sprache:	eng
Schlagworte:	45/23 631/1647/1513 631/208/212 631/208/727/728 Algorithms Alleles Gene Frequency Genome, Human Genome-Wide Association Study Haplotypes Humanities and Social Sciences Humans Microarray Analysis - statistics & numerical data multidisciplinary Polymorphism, Single Nucleotide Science Science (multidisciplinary)
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	A major use of the 1000 Genomes Project (1000GP) data is genotype imputation in genome-wide association studies (GWAS). Here we develop a method to estimate haplotypes from low-coverage sequencing data that can take advantage of single-nucleotide polymorphism (SNP) microarray genotypes on the same samples. First the SNP array data are phased to build a backbone (or ‘scaffold’) of haplotypes across each chromosome. We then phase the sequence data ‘onto’ this haplotype scaffold. This approach can take advantage of relatedness between sequenced and non-sequenced samples to improve accuracy. We use this method to create a new 1000GP haplotype reference set for use by the human genetic community. Using a set of validation genotypes at SNP and bi-allelic indels we show that these haplotypes have lower genotype discordance and improved imputation performance into downstream GWAS samples, especially at low-frequency variants. 1000 Genomes imputation can increase the power of genome-wide association studies to detect genetic variants associated with human traits and diseases. Here, the authors develop a method to integrate and analyse low-coverage sequence data and SNP array data, and show that it improves imputation performance.
ISSN:	2041-1723 2041-1723
DOI:	10.1038/ncomms4934