Imputation across genotyping arrays for genome-wide association studies: assessment of bias and a correction strategy

A great promise of publicly sharing genome-wide association data is the potential to create composite sets of controls. However, studies often use different genotyping arrays, and imputation to a common set of SNPs has shown substantial bias: a problem which has no broadly applicable solution. Based...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Human genetics 2013-05, Vol.132 (5), p.509-522
Hauptverfasser:	Johnson, Eric O., Hancock, Dana B., Levy, Joshua L., Gaddis, Nathan C., Saccone, Nancy L., Bierut, Laura J., Page, Grier P.
Format:	Artikel
Sprache:	eng
Schlagworte:	African Americans Algorithms Analysis Arrays Bias Biomedical and Life Sciences Biomedicine Black or African American - genetics Female Gene Frequency Gene Function Genome, Human - genetics Genome-Wide Association Study Genomes Genomics Genotype Haplotypes HapMap Project Human Genetics Humans Male Metabolic Diseases Models, Statistical Molecular Medicine Oligonucleotide Array Sequence Analysis Original Investigation Phenotype Polymorphism, Single Nucleotide - genetics Sample size Statistical power White People - genetics
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	A great promise of publicly sharing genome-wide association data is the potential to create composite sets of controls. However, studies often use different genotyping arrays, and imputation to a common set of SNPs has shown substantial bias: a problem which has no broadly applicable solution. Based on the idea that using differing genotyped SNP sets as inputs creates differential imputation errors and thus bias in the composite set of controls, we examined the degree to which each of the following occurs: (1) imputation based on the union of genotyped SNPs (i.e., SNPs available on one or more arrays) results in bias, as evidenced by spurious associations (type 1 error) between imputed genotypes and arbitrarily assigned case/control status; (2) imputation based on the intersection of genotyped SNPs (i.e., SNPs available on all arrays) does not evidence such bias; and (3) imputation quality varies by the size of the intersection of genotyped SNP sets. Imputations were conducted in European Americans and African Americans with reference to HapMap phase II and III data. Imputation based on the union of genotyped SNPs across the Illumina 1M and 550v3 arrays showed spurious associations for 0.2 % of SNPs: ~2,000 false positives per million SNPs imputed. Biases remained problematic for very similar arrays (550v1 vs. 550v3) and were substantial for dissimilar arrays (Illumina 1M vs. Affymetrix 6.0). In all instances, imputing based on the intersection of genotyped SNPs (as few as 30 % of the total SNPs genotyped) eliminated such bias while still achieving good imputation quality.
ISSN:	0340-6717 1432-1203 1432-1203
DOI:	10.1007/s00439-013-1266-7