Genetic association models are robust to common population kinship estimation biases

Abstract Common genetic association models for structured populations, including principal component analysis (PCA) and linear mixed-effects models (LMMs), model the correlation structure between individuals using population kinship matrices, also known as genetic relatedness matrices. However, the...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Genetics (Austin) 2023-05, Vol.224 (1)
Hauptverfasser:	Hou, Zhuoran, Ochoa, Alejandro
Format:	Artikel
Sprache:	eng
Schlagworte:	Bias Estimators Generalized linear models Genomes Genotype Genotypes Humans Invariants Investigation Linear algebra Linear Models Matrices (mathematics) Models, Genetic Phenotype Population Groups - genetics Principal components analysis Robustness Statistical analysis Statistical models
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Abstract Common genetic association models for structured populations, including principal component analysis (PCA) and linear mixed-effects models (LMMs), model the correlation structure between individuals using population kinship matrices, also known as genetic relatedness matrices. However, the most common kinship estimators can have severe biases that were only recently determined. Here we characterize the effect of these kinship biases on genetic association. We employ a large simulated admixed family and genotypes from the 1000 Genomes Project, both with simulated traits, to evaluate key kinship estimators. Remarkably, we find practically invariant association statistics for kinship matrices of different bias types (matching all other features). We then prove using statistical theory and linear algebra that LMM association tests are invariant to these kinship biases, and PCA approximately so. Our proof shows that the intercept and relatedness effect coefficients compensate for the kinship bias, an argument that extends to generalized linear models. As a corollary, association testing is also invariant to changing the reference ancestral population of the kinship matrix. Lastly, we observed that all kinship estimators, except for popkin ratio-of-means, can give improper non-positive semidefinite matrices, which can be problematic although some LMMs handle them surprisingly well, and condition numbers can be used to choose kinship estimators. Overall, we find that existing association studies are robust to kinship estimation bias, and our calculations may help improve association methods by taking advantage of this unexpected robustness, as well as help determine the effects of kinship bias in related problems. The most popular genetic association models for structured populations use kinship matrices to model population structure; however, the most common kinship estimator is biased. Here, Hou and Ochoa characterize the effect of kinship bias on genetic association and discover that there is no effect. They prove, theoretically and empirically, how kinship bias is compensated for by the regression intercept and report novel findings regarding variant weighing and power, as well as non-positive semidefinite estimates and their effect on numerical accuracy.
ISSN:	1943-2631 0016-6731 1943-2631
DOI:	10.1093/genetics/iyad030