Translation between Molecules and Natural Language
We present $\textbf{MolT5}$ $-$ a self-supervised learning framework for pretraining models on a vast amount of unlabeled natural language text and molecule strings. $\textbf{MolT5}$ allows for new, useful, and challenging analogs of traditional vision-language tasks, such as molecule captioning and...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We present $\textbf{MolT5}$ $-$ a self-supervised learning framework for
pretraining models on a vast amount of unlabeled natural language text and
molecule strings. $\textbf{MolT5}$ allows for new, useful, and challenging
analogs of traditional vision-language tasks, such as molecule captioning and
text-based de novo molecule generation (altogether: translation between
molecules and language), which we explore for the first time. Since
$\textbf{MolT5}$ pretrains models on single-modal data, it helps overcome the
chemistry domain shortcoming of data scarcity. Furthermore, we consider several
metrics, including a new cross-modal embedding-based metric, to evaluate the
tasks of molecule captioning and text-based molecule generation. Our results
show that $\textbf{MolT5}$-based models are able to generate outputs, both
molecules and captions, which in many cases are high quality. |
---|---|
DOI: | 10.48550/arxiv.2204.11817 |