Leveraging Biomolecule and Natural Language through Multi-Modal Learning: A Survey
The integration of biomolecular modeling with natural language (BL) has emerged as a promising interdisciplinary area at the intersection of artificial intelligence, chemistry and biology. This approach leverages the rich, multifaceted descriptions of biomolecules contained within textual data sourc...
Gespeichert in:
Hauptverfasser: | , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The integration of biomolecular modeling with natural language (BL) has
emerged as a promising interdisciplinary area at the intersection of artificial
intelligence, chemistry and biology. This approach leverages the rich,
multifaceted descriptions of biomolecules contained within textual data sources
to enhance our fundamental understanding and enable downstream computational
tasks such as biomolecule property prediction. The fusion of the nuanced
narratives expressed through natural language with the structural and
functional specifics of biomolecules described via various molecular modeling
techniques opens new avenues for comprehensively representing and analyzing
biomolecules. By incorporating the contextual language data that surrounds
biomolecules into their modeling, BL aims to capture a holistic view
encompassing both the symbolic qualities conveyed through language as well as
quantitative structural characteristics. In this review, we provide an
extensive analysis of recent advancements achieved through cross modeling of
biomolecules and natural language. (1) We begin by outlining the technical
representations of biomolecules employed, including sequences, 2D graphs, and
3D structures. (2) We then examine in depth the rationale and key objectives
underlying effective multi-modal integration of language and molecular data
sources. (3) We subsequently survey the practical applications enabled to date
in this developing research area. (4) We also compile and summarize the
available resources and datasets to facilitate future work. (5) Looking ahead,
we identify several promising research directions worthy of further exploration
and investment to continue advancing the field. The related resources and
contents are updating in
\url{https://github.com/QizhiPei/Awesome-Biomolecule-Language-Cross-Modeling}. |
---|---|
DOI: | 10.48550/arxiv.2403.01528 |