Multi-modal Transfer Learning between Biological Foundation Models
Biological sequences encode fundamental instructions for the building blocks of life, in the form of DNA, RNA, and proteins. Modeling these sequences is key to understand disease mechanisms and is an active research area in computational biology. Recently, Large Language Models have shown great prom...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Biological sequences encode fundamental instructions for the building blocks
of life, in the form of DNA, RNA, and proteins. Modeling these sequences is key
to understand disease mechanisms and is an active research area in
computational biology. Recently, Large Language Models have shown great promise
in solving certain biological tasks but current approaches are limited to a
single sequence modality (DNA, RNA, or protein). Key problems in genomics
intrinsically involve multiple modalities, but it remains unclear how to adapt
general-purpose sequence models to those cases. In this work we propose a
multi-modal model that connects DNA, RNA, and proteins by leveraging
information from different pre-trained modality-specific encoders. We
demonstrate its capabilities by applying it to the largely unsolved problem of
predicting how multiple RNA transcript isoforms originate from the same gene
(i.e. same DNA sequence) and map to different transcription expression levels
across various human tissues. We show that our model, dubbed IsoFormer, is able
to accurately predict differential transcript expression, outperforming
existing methods and leveraging the use of multiple modalities. Our framework
also achieves efficient transfer knowledge from the encoders pre-training as
well as in between modalities. We open-source our model, paving the way for new
multi-modal gene expression approaches. |
---|---|
DOI: | 10.48550/arxiv.2406.14150 |