The Translational Machine: A novel machine‐learning approach to illuminate complex genetic architectures
The Translational Machine (TM) is a machine learning (ML)‐based analytic pipeline that translates genotypic/variant call data into biologically contextualized features that richly characterize complex variant architectures and permit greater interpretability and biological replication. It also reduc...
Gespeichert in:
Veröffentlicht in: | Genetic epidemiology 2021-07, Vol.45 (5), p.485-536 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The Translational Machine (TM) is a machine learning (ML)‐based analytic pipeline that translates genotypic/variant call data into biologically contextualized features that richly characterize complex variant architectures and permit greater interpretability and biological replication. It also reduces potentially confounding effects of population substructure on outcome prediction. The TM consists of three main components. First, replicable but flexible feature engineering procedures translate genome‐scale data into biologically informative features that appropriately contextualize simple variant calls/genotypes within biological and functional contexts. Second, model‐free, nonparametric ML‐based feature filtering procedures empirically reduce dimensionality and noise of both original genotype calls and engineered features. Third, a powerful ML algorithm for feature selection is used to differentiate risk variant contributions across variant frequency and functional prediction spectra. The TM simultaneously evaluates potential contributions of variants operative under polygenic and heterogeneous models of genetic architecture. Our TM enables integration of biological information (e.g., genomic annotations) within conceptual frameworks akin to geneset‐/pathways‐based and collapsing methods, but overcomes some of these methods' limitations. The full TM pipeline is executed in R. Our approach and initial findings from its application to a whole‐exome schizophrenia case–control data set are presented. These TM procedures extend the findings of the primary investigation and yield novel results. |
---|---|
ISSN: | 0741-0395 1098-2272 |
DOI: | 10.1002/gepi.22383 |