On the cross-population generalizability of gene expression prediction models
The genetic control of gene expression is a core component of human physiology. For the past several years, transcriptome-wide association studies have leveraged large datasets of linked genotype and RNA sequencing information to create a powerful gene-based test of association that has been used in...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , , , , , , , , , , , |
---|---|
Format: | Dataset |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The genetic control of gene expression is a core component of human
physiology. For the past several years, transcriptome-wide association
studies have leveraged large datasets of linked genotype and RNA
sequencing information to create a powerful gene-based test of association
that has been used in dozens of studies. While numerous discoveries have
been made, the populations in the training data are overwhelmingly of
European descent, and little is known about the generalizability of these
models to other populations. Here, we test for cross-population
generalizability of gene expression prediction models using a dataset of
African American individuals with RNA-Seq data in whole blood. We find
that the default models trained in large datasets such as GTEx and DGN
fare poorly in African Americans, with a notable reduction in prediction
accuracy when compared to European Americans. We replicate these
limitations in cross-population generalizability using the five
populations in the GEUVADIS dataset. Via realistic simulations of both
populations and gene expression, we show that accurate cross-population
generalizability of transcriptome prediction only arises when eQTL
architecture is substantially shared across populations. In contrast,
models with non-identical eQTLs showed patterns similar to real-world
data. Therefore, generating RNA-Seq data in diverse populations is a
critical step towards multi-ethnic utility of gene expression prediction. |
---|---|
DOI: | 10.7272/q6rn362z |