Self-Normalizing Foundation Model for Enhanced Multi-Omics Data Analysis in Oncology
Multi-omics research has enhanced our understanding of cancer heterogeneity and progression. Investigating molecular data through multi-omics approaches is crucial for unraveling the complex biological mechanisms underlying cancer, thereby enabling more effective diagnosis, treatment, and prevention...
Gespeichert in:
Hauptverfasser: | , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Multi-omics research has enhanced our understanding of cancer heterogeneity
and progression. Investigating molecular data through multi-omics approaches is
crucial for unraveling the complex biological mechanisms underlying cancer,
thereby enabling more effective diagnosis, treatment, and prevention
strategies. However, predicting patient outcomes through the integration of all
available multi-omics data is still an under-study research direction. Here, we
present SeNMo, a foundation model that has been trained on multi-omics data
across 33 cancer types. SeNMo is particularly efficient in handling multi-omics
data characterized by high-width and low-length attributes. We trained SeNMo
for the task of overall survival of patients using pan-cancer multi-omics data
involving 33 cancer sites from the GDC. The training multi-omics data includes
gene expression, DNA methylation, miRNA expression, DNA mutations, protein
expression modalities, and clinical data. SeNMo was validated on two
independent cohorts: Moffitt Cancer Center and CPTAC lung squamous cell
carcinoma. We evaluated the model's performance in predicting patient's overall
survival using the C-Index. SeNMo performed consistently well in the training
regime, reflected by the validation C-Index of 0.76 on GDC's public data. In
the testing regime, SeNMo performed with a C-Index of 0.758 on a held-out test
set. The model showed an average accuracy of 99.8% on the task of classifying
the primary cancer type on the pan-cancer test cohort. SeNMo demonstrated
robust performance on the classification task of predicting the primary cancer
type of patients. SeNMo further demonstrated significant performance in
predicting tertiary lymph structures from multi-omics data, showing
generalizability across cancer types, molecular data types, and clinical
endpoints. |
---|---|
DOI: | 10.48550/arxiv.2405.08226 |