Leveraging Biomolecule and Natural Language through Multi-Modal Learning: A Survey

The integration of biomolecular modeling with natural language (BL) has emerged as a promising interdisciplinary area at the intersection of artificial intelligence, chemistry and biology. This approach leverages the rich, multifaceted descriptions of biomolecules contained within textual data sourc...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Pei, Qizhi, Wu, Lijun, Gao, Kaiyuan, Zhu, Jinhua, Wang, Yue, Wang, Zun, Qin, Tao, Yan, Rui
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Pei, Qizhi
Wu, Lijun
Gao, Kaiyuan
Zhu, Jinhua
Wang, Yue
Wang, Zun
Qin, Tao
Yan, Rui
description The integration of biomolecular modeling with natural language (BL) has emerged as a promising interdisciplinary area at the intersection of artificial intelligence, chemistry and biology. This approach leverages the rich, multifaceted descriptions of biomolecules contained within textual data sources to enhance our fundamental understanding and enable downstream computational tasks such as biomolecule property prediction. The fusion of the nuanced narratives expressed through natural language with the structural and functional specifics of biomolecules described via various molecular modeling techniques opens new avenues for comprehensively representing and analyzing biomolecules. By incorporating the contextual language data that surrounds biomolecules into their modeling, BL aims to capture a holistic view encompassing both the symbolic qualities conveyed through language as well as quantitative structural characteristics. In this review, we provide an extensive analysis of recent advancements achieved through cross modeling of biomolecules and natural language. (1) We begin by outlining the technical representations of biomolecules employed, including sequences, 2D graphs, and 3D structures. (2) We then examine in depth the rationale and key objectives underlying effective multi-modal integration of language and molecular data sources. (3) We subsequently survey the practical applications enabled to date in this developing research area. (4) We also compile and summarize the available resources and datasets to facilitate future work. (5) Looking ahead, we identify several promising research directions worthy of further exploration and investment to continue advancing the field. The related resources and contents are updating in \url{https://github.com/QizhiPei/Awesome-Biomolecule-Language-Cross-Modeling}.
doi_str_mv 10.48550/arxiv.2403.01528
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2403_01528</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2403_01528</sourcerecordid><originalsourceid>FETCH-LOGICAL-a678-c4f4d59b56b5f55b86a807a9bb6cf52698670cd002a1003ef7b32e7b943d88383</originalsourceid><addsrcrecordid>eNotz8lOwzAYBGBfOKDCA3DCL5DgeA-3UrFJKUht79HveGkkN0EmjujbQ1tOcxjNSB9CdxUpuRaCPED66eeScsJKUgmqr9GmcbNLEPoh4Kd-PIzRdTk6DIPFHzDlBBE3MIQMweFpn8Yc9nid49QX69GeSgdp-Fs_4iXe5jS74w268hC_3e1_LtDu5Xm3eiuaz9f31bIpQCpddNxzK2ojpBFeCKMlaKKgNkZ2XlBZa6lIZwmhUBHCnFeGUadMzZnVmmm2QPeX2zOq_Ur9AdKxPeHaM479AsJXSaE</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Leveraging Biomolecule and Natural Language through Multi-Modal Learning: A Survey</title><source>arXiv.org</source><creator>Pei, Qizhi ; Wu, Lijun ; Gao, Kaiyuan ; Zhu, Jinhua ; Wang, Yue ; Wang, Zun ; Qin, Tao ; Yan, Rui</creator><creatorcontrib>Pei, Qizhi ; Wu, Lijun ; Gao, Kaiyuan ; Zhu, Jinhua ; Wang, Yue ; Wang, Zun ; Qin, Tao ; Yan, Rui</creatorcontrib><description>The integration of biomolecular modeling with natural language (BL) has emerged as a promising interdisciplinary area at the intersection of artificial intelligence, chemistry and biology. This approach leverages the rich, multifaceted descriptions of biomolecules contained within textual data sources to enhance our fundamental understanding and enable downstream computational tasks such as biomolecule property prediction. The fusion of the nuanced narratives expressed through natural language with the structural and functional specifics of biomolecules described via various molecular modeling techniques opens new avenues for comprehensively representing and analyzing biomolecules. By incorporating the contextual language data that surrounds biomolecules into their modeling, BL aims to capture a holistic view encompassing both the symbolic qualities conveyed through language as well as quantitative structural characteristics. In this review, we provide an extensive analysis of recent advancements achieved through cross modeling of biomolecules and natural language. (1) We begin by outlining the technical representations of biomolecules employed, including sequences, 2D graphs, and 3D structures. (2) We then examine in depth the rationale and key objectives underlying effective multi-modal integration of language and molecular data sources. (3) We subsequently survey the practical applications enabled to date in this developing research area. (4) We also compile and summarize the available resources and datasets to facilitate future work. (5) Looking ahead, we identify several promising research directions worthy of further exploration and investment to continue advancing the field. The related resources and contents are updating in \url{https://github.com/QizhiPei/Awesome-Biomolecule-Language-Cross-Modeling}.</description><identifier>DOI: 10.48550/arxiv.2403.01528</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Computation and Language ; Quantitative Biology - Biomolecules</subject><creationdate>2024-03</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,778,883</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2403.01528$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2403.01528$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Pei, Qizhi</creatorcontrib><creatorcontrib>Wu, Lijun</creatorcontrib><creatorcontrib>Gao, Kaiyuan</creatorcontrib><creatorcontrib>Zhu, Jinhua</creatorcontrib><creatorcontrib>Wang, Yue</creatorcontrib><creatorcontrib>Wang, Zun</creatorcontrib><creatorcontrib>Qin, Tao</creatorcontrib><creatorcontrib>Yan, Rui</creatorcontrib><title>Leveraging Biomolecule and Natural Language through Multi-Modal Learning: A Survey</title><description>The integration of biomolecular modeling with natural language (BL) has emerged as a promising interdisciplinary area at the intersection of artificial intelligence, chemistry and biology. This approach leverages the rich, multifaceted descriptions of biomolecules contained within textual data sources to enhance our fundamental understanding and enable downstream computational tasks such as biomolecule property prediction. The fusion of the nuanced narratives expressed through natural language with the structural and functional specifics of biomolecules described via various molecular modeling techniques opens new avenues for comprehensively representing and analyzing biomolecules. By incorporating the contextual language data that surrounds biomolecules into their modeling, BL aims to capture a holistic view encompassing both the symbolic qualities conveyed through language as well as quantitative structural characteristics. In this review, we provide an extensive analysis of recent advancements achieved through cross modeling of biomolecules and natural language. (1) We begin by outlining the technical representations of biomolecules employed, including sequences, 2D graphs, and 3D structures. (2) We then examine in depth the rationale and key objectives underlying effective multi-modal integration of language and molecular data sources. (3) We subsequently survey the practical applications enabled to date in this developing research area. (4) We also compile and summarize the available resources and datasets to facilitate future work. (5) Looking ahead, we identify several promising research directions worthy of further exploration and investment to continue advancing the field. The related resources and contents are updating in \url{https://github.com/QizhiPei/Awesome-Biomolecule-Language-Cross-Modeling}.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Computation and Language</subject><subject>Quantitative Biology - Biomolecules</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotz8lOwzAYBGBfOKDCA3DCL5DgeA-3UrFJKUht79HveGkkN0EmjujbQ1tOcxjNSB9CdxUpuRaCPED66eeScsJKUgmqr9GmcbNLEPoh4Kd-PIzRdTk6DIPFHzDlBBE3MIQMweFpn8Yc9nid49QX69GeSgdp-Fs_4iXe5jS74w268hC_3e1_LtDu5Xm3eiuaz9f31bIpQCpddNxzK2ojpBFeCKMlaKKgNkZ2XlBZa6lIZwmhUBHCnFeGUadMzZnVmmm2QPeX2zOq_Ur9AdKxPeHaM479AsJXSaE</recordid><startdate>20240303</startdate><enddate>20240303</enddate><creator>Pei, Qizhi</creator><creator>Wu, Lijun</creator><creator>Gao, Kaiyuan</creator><creator>Zhu, Jinhua</creator><creator>Wang, Yue</creator><creator>Wang, Zun</creator><creator>Qin, Tao</creator><creator>Yan, Rui</creator><scope>AKY</scope><scope>ALC</scope><scope>GOX</scope></search><sort><creationdate>20240303</creationdate><title>Leveraging Biomolecule and Natural Language through Multi-Modal Learning: A Survey</title><author>Pei, Qizhi ; Wu, Lijun ; Gao, Kaiyuan ; Zhu, Jinhua ; Wang, Yue ; Wang, Zun ; Qin, Tao ; Yan, Rui</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a678-c4f4d59b56b5f55b86a807a9bb6cf52698670cd002a1003ef7b32e7b943d88383</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Computation and Language</topic><topic>Quantitative Biology - Biomolecules</topic><toplevel>online_resources</toplevel><creatorcontrib>Pei, Qizhi</creatorcontrib><creatorcontrib>Wu, Lijun</creatorcontrib><creatorcontrib>Gao, Kaiyuan</creatorcontrib><creatorcontrib>Zhu, Jinhua</creatorcontrib><creatorcontrib>Wang, Yue</creatorcontrib><creatorcontrib>Wang, Zun</creatorcontrib><creatorcontrib>Qin, Tao</creatorcontrib><creatorcontrib>Yan, Rui</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv Quantitative Biology</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Pei, Qizhi</au><au>Wu, Lijun</au><au>Gao, Kaiyuan</au><au>Zhu, Jinhua</au><au>Wang, Yue</au><au>Wang, Zun</au><au>Qin, Tao</au><au>Yan, Rui</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Leveraging Biomolecule and Natural Language through Multi-Modal Learning: A Survey</atitle><date>2024-03-03</date><risdate>2024</risdate><abstract>The integration of biomolecular modeling with natural language (BL) has emerged as a promising interdisciplinary area at the intersection of artificial intelligence, chemistry and biology. This approach leverages the rich, multifaceted descriptions of biomolecules contained within textual data sources to enhance our fundamental understanding and enable downstream computational tasks such as biomolecule property prediction. The fusion of the nuanced narratives expressed through natural language with the structural and functional specifics of biomolecules described via various molecular modeling techniques opens new avenues for comprehensively representing and analyzing biomolecules. By incorporating the contextual language data that surrounds biomolecules into their modeling, BL aims to capture a holistic view encompassing both the symbolic qualities conveyed through language as well as quantitative structural characteristics. In this review, we provide an extensive analysis of recent advancements achieved through cross modeling of biomolecules and natural language. (1) We begin by outlining the technical representations of biomolecules employed, including sequences, 2D graphs, and 3D structures. (2) We then examine in depth the rationale and key objectives underlying effective multi-modal integration of language and molecular data sources. (3) We subsequently survey the practical applications enabled to date in this developing research area. (4) We also compile and summarize the available resources and datasets to facilitate future work. (5) Looking ahead, we identify several promising research directions worthy of further exploration and investment to continue advancing the field. The related resources and contents are updating in \url{https://github.com/QizhiPei/Awesome-Biomolecule-Language-Cross-Modeling}.</abstract><doi>10.48550/arxiv.2403.01528</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2403.01528
ispartof
issn
language eng
recordid cdi_arxiv_primary_2403_01528
source arXiv.org
subjects Computer Science - Artificial Intelligence
Computer Science - Computation and Language
Quantitative Biology - Biomolecules
title Leveraging Biomolecule and Natural Language through Multi-Modal Learning: A Survey
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-15T11%3A53%3A03IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Leveraging%20Biomolecule%20and%20Natural%20Language%20through%20Multi-Modal%20Learning:%20A%20Survey&rft.au=Pei,%20Qizhi&rft.date=2024-03-03&rft_id=info:doi/10.48550/arxiv.2403.01528&rft_dat=%3Carxiv_GOX%3E2403_01528%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true