Evaluation of Designs and Estimation Methods Under Response-Dependent Two-Phase Sampling for Genetic Association Studies
In many genetic association analyses, while the aim is to identify genetic variants associated with a given quantitative trait, budgetary constraints prevent genotyping all individuals in a cohort. Selection of individuals for genotyping according to their quantitative trait value can improve cost e...
Gespeichert in:
Veröffentlicht in: | Statistics in biosciences 2023-07, Vol.15 (2), p.510-539 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In many genetic association analyses, while the aim is to identify genetic variants associated with a given quantitative trait, budgetary constraints prevent genotyping all individuals in a cohort. Selection of individuals for genotyping according to their quantitative trait value can improve cost efficiency. We consider quantitative trait-dependent two-phase sampling designs. In the first phase, trait and inexpensive covariate values for all individuals in a cohort are obtained; in the second phase, genetic sequence data for a subset of individuals are obtained according to their trait values and possibly their inexpensive covariates. We consider the likelihood and pseudo-likelihood methods proposed to analyze response-biased samples, assess their performance under common, low-frequency, and rare variant analyses, compare their efficiencies and investigate efficient response-dependent sampling designs under each method. We also assess robustness of the estimation methods and sampling designs under misspecified models. The results show that extreme sampling is the most efficient design for common variant analysis, and that selecting a small sample from the middle stratum improves accuracy and precision in low-frequency and rare variant analyses. Likelihood methods under an extreme sampling design generally give the most accurate and precise estimates when the model is correctly specified. Both the estimated pseudo-likelihood and pseudo-conditional likelihood methods become more efficient under model misspecification. |
---|---|
ISSN: | 1867-1764 1867-1772 |
DOI: | 10.1007/s12561-023-09369-7 |