Phenotype prediction using biologically interpretable neural networks on multi-cohort multi-omics data

Integrating multi-omics data into predictive models has the potential to enhance accuracy, which is essential for precision medicine. In this study, we developed interpretable predictive models for multi-omics data by employing neural networks informed by prior biological knowledge, referred to as v...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	NPJ systems biology and applications 2024-08, Vol.10 (1), p.81-10, Article 81
Hauptverfasser:	van Hilten, Arno, van Rooij, Jeroen, Ikram, M. Arfan, Niessen, Wiro J., van Meurs, Joyce. B. J., Roshchupkin, Gennady V.
Format:	Artikel
Sprache:	eng
Schlagworte:	631/114/794 692/53 Adult Aged Bioinformatics Biological analysis Biomedical and Life Sciences Blood levels CD34 antigen Cohort Studies Computational Biology - methods Computational Biology/Bioinformatics Computer Appl. in Life Sciences CpG Islands - genetics Decision making DNA methylation DNA Methylation - genetics Female Genomics - methods Humans Life Sciences Low density lipoprotein Male Middle Aged Multiomics Neural networks Neural Networks, Computer Phenotype Phenotypes Precision medicine Prediction models Smoking Smoking - genetics Systems Biology
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Integrating multi-omics data into predictive models has the potential to enhance accuracy, which is essential for precision medicine. In this study, we developed interpretable predictive models for multi-omics data by employing neural networks informed by prior biological knowledge, referred to as visible networks. These neural networks offer insights into the decision-making process and can unveil novel perspectives on the underlying biological mechanisms associated with traits and complex diseases. We tested the performance, interpretability and generalizability for inferring smoking status, subject age and LDL levels using genome-wide RNA expression and CpG methylation data from the blood of the BIOS consortium (four population cohorts, N total = 2940). In a cohort-wise cross-validation setting, the consistency of the diagnostic performance and interpretation was assessed. Performance was consistently high for predicting smoking status with an overall mean AUC of 0.95 (95% CI: 0.90–1.00) and interpretation revealed the involvement of well-replicated genes such as AHRR , GPR15 and LRRN3 . LDL-level predictions were only generalized in a single cohort with an R 2 of 0.07 (95% CI: 0.05–0.08). Age was inferred with a mean error of 5.16 (95% CI: 3.97–6.35) years with the genes COL11A2, AFAP1 , OTUD7A , PTPRN2 , ADARB2 and CD34 consistently predictive. For both regression tasks, we found that using multi-omics networks improved performance, stability and generalizability compared to interpretable single omic networks. We believe that visible neural networks have great potential for multi-omics analysis; they combine multi-omic data elegantly, are interpretable, and generalize well to data from different cohorts.
ISSN:	2056-7189 2056-7189
DOI:	10.1038/s41540-024-00405-w