Dissecting heritability, environmental risk, and air pollution causal effects using > 50 million individuals in MarketScan
Large national-level electronic health record (EHR) datasets offer new opportunities for disentangling the role of genes and environment through deep phenotype information and approximate pedigree structures. Here we use the approximate geographical locations of patients as a proxy for spatially cor...
Gespeichert in:
Veröffentlicht in: | Nature communications 2024-06, Vol.15 (1), p.5357-14, Article 5357 |
---|---|
Hauptverfasser: | , , , , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Large national-level electronic health record (EHR) datasets offer new opportunities for disentangling the role of genes and environment through deep phenotype information and approximate pedigree structures. Here we use the approximate geographical locations of patients as a proxy for spatially correlated community-level environmental risk factors. We develop a spatial mixed linear effect (SMILE) model that incorporates both genetics and environmental contribution. We extract EHR and geographical locations from 257,620 nuclear families and compile 1083 disease outcome measurements from the MarketScan dataset. We augment the EHR with publicly available environmental data, including levels of particulate matter 2.5 (PM
2.5
), nitrogen dioxide (NO
2
), climate, and sociodemographic data. We refine the estimates of genetic heritability and quantify community-level environmental contributions. We also use wind speed and direction as instrumental variables to assess the causal effects of air pollution. In total, we find PM
2.5
or NO
2
have statistically significant causal effects on 135 diseases, including respiratory, musculoskeletal, digestive, metabolic, and sleep disorders, where PM
2.5
and NO
2
tend to affect biologically distinct disease categories. These analyses showcase several robust strategies for jointly modeling genetic and environmental effects on disease risk using large EHR datasets and will benefit upcoming biobank studies in the era of precision medicine.
Large national-level electronic health record datasets offer new opportunities for disentangling the roles of genes and environment in human diseases. Here, the authors propose a spatial mixed linear effect model (SMILE) to dissect genetic and environmental risk factors for diseases and assess the causality of air pollutants in an insurance claim dataset with 50 million individuals. |
---|---|
ISSN: | 2041-1723 2041-1723 |
DOI: | 10.1038/s41467-024-49566-6 |