Bi‐level structured functional analysis for genome‐wide association studies

Genome‐wide association studies (GWAS) have led to great successes in identifying genotype–phenotype associations for complex human diseases. In such studies, the high dimensionality of single nucleotide polymorphisms (SNPs) often makes analysis difficult. Functional analysis, which interprets SNPs...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Biometrics 2023-12, Vol.79 (4), p.3359-3373
Hauptverfasser: Wu, Mengyun, Wang, Fan, Ge, Yeheng, Ma, Shuangge, Li, Yang
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Genome‐wide association studies (GWAS) have led to great successes in identifying genotype–phenotype associations for complex human diseases. In such studies, the high dimensionality of single nucleotide polymorphisms (SNPs) often makes analysis difficult. Functional analysis, which interprets SNPs densely distributed in a chromosomal region as a continuous process rather than discrete observations, has emerged as a promising avenue for overcoming the high dimensionality challenges. However, the majority of the existing functional studies continue to be individual SNP based and are unable to sufficiently account for the intricate underpinning structures of SNP data. SNPs are often found in groups (e.g., genes or pathways) and have a natural group structure. Additionally, these SNP groups can be highly correlated with coordinated biological functions and interact in a network. Motivated by these unique characteristics of SNP data, we develop a novel bi‐level structured functional analysis method and investigate disease‐associated genetic variants at the SNP level and SNP group level simultaneously. The penalization technique is adopted for bi‐level selection and also to accommodate the group‐level network structure. Both the estimation and selection consistency properties are rigorously established. The superiority of the proposed method over alternatives is shown through extensive simulation studies. A type 2 diabetes SNP data application yields some biologically intriguing results.
ISSN:0006-341X
1541-0420
DOI:10.1111/biom.13871