Multi-view biomedical foundation models for molecule-target and property prediction

Foundation models applied to bio-molecular space hold promise to accelerate drug discovery. Molecular representation is key to building such models. Previous works have typically focused on a single representation or view of the molecules. Here, we develop a multi-view foundation model approach, tha...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Suryanarayanan, Parthasarathy, Qiu, Yunguang, Sethi, Shreyans, Mahajan, Diwakar, Li, Hongyang, Yang, Yuxin, Eyigoz, Elif, Saenz, Aldo Guzman, Platt, Daniel E, Rumbell, Timothy H, Ng, Kenney, Dey, Sanjoy, Burch, Myson, Kwon, Bum Chul, Meyer, Pablo, Cheng, Feixiong, Hu, Jianying, Morrone, Joseph A
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Artificial Intelligence Computer Science - Learning Quantitative Biology - Biomolecules
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Suryanarayanan, Parthasarathy Qiu, Yunguang Sethi, Shreyans Mahajan, Diwakar Li, Hongyang Yang, Yuxin Eyigoz, Elif Saenz, Aldo Guzman Platt, Daniel E Rumbell, Timothy H Ng, Kenney Dey, Sanjoy Burch, Myson Kwon, Bum Chul Meyer, Pablo Cheng, Feixiong Hu, Jianying Morrone, Joseph A
description	Foundation models applied to bio-molecular space hold promise to accelerate drug discovery. Molecular representation is key to building such models. Previous works have typically focused on a single representation or view of the molecules. Here, we develop a multi-view foundation model approach, that integrates molecular views of graph, image and text. Single-view foundation models are each pre-trained on a dataset of up to 200M molecules and then aggregated into combined representations. Our multi-view model is validated on a diverse set of 18 tasks, encompassing ligand-protein binding, molecular solubility, metabolism and toxicity. We show that the multi-view models perform robustly and are able to balance the strengths and weaknesses of specific views. We then apply this model to screen compounds against a large (>100 targets) set of G Protein-Coupled receptors (GPCRs). From this library of targets, we identify 33 that are related to Alzheimer's disease. On this subset, we employ our model to identify strong binders, which are validated through structure-based modeling and identification of key binding motifs.
doi_str_mv	10.48550/arxiv.2410.19704
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2410_19704</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2410_19704</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2410_197043</originalsourceid><addsrcrecordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMgEKGFqaG5hwMgT7luaUZOqWZaaWKyRl5uempmQmJ-YopOWX5qUklmTm5ynk5qek5hQDRYqAzJzU5NKcVN2SxKL01BKFxLwUhYKi_ILUopJKIAOkF6SFh4E1LTGnOJUXSnMzyLu5hjh76IKtjy8oysxNLKqMBzkjHuwMY8IqAErtPsw</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Multi-view biomedical foundation models for molecule-target and property prediction</title><source>arXiv.org</source><creator>Suryanarayanan, Parthasarathy ; Qiu, Yunguang ; Sethi, Shreyans ; Mahajan, Diwakar ; Li, Hongyang ; Yang, Yuxin ; Eyigoz, Elif ; Saenz, Aldo Guzman ; Platt, Daniel E ; Rumbell, Timothy H ; Ng, Kenney ; Dey, Sanjoy ; Burch, Myson ; Kwon, Bum Chul ; Meyer, Pablo ; Cheng, Feixiong ; Hu, Jianying ; Morrone, Joseph A</creator><creatorcontrib>Suryanarayanan, Parthasarathy ; Qiu, Yunguang ; Sethi, Shreyans ; Mahajan, Diwakar ; Li, Hongyang ; Yang, Yuxin ; Eyigoz, Elif ; Saenz, Aldo Guzman ; Platt, Daniel E ; Rumbell, Timothy H ; Ng, Kenney ; Dey, Sanjoy ; Burch, Myson ; Kwon, Bum Chul ; Meyer, Pablo ; Cheng, Feixiong ; Hu, Jianying ; Morrone, Joseph A</creatorcontrib><description>Foundation models applied to bio-molecular space hold promise to accelerate drug discovery. Molecular representation is key to building such models. Previous works have typically focused on a single representation or view of the molecules. Here, we develop a multi-view foundation model approach, that integrates molecular views of graph, image and text. Single-view foundation models are each pre-trained on a dataset of up to 200M molecules and then aggregated into combined representations. Our multi-view model is validated on a diverse set of 18 tasks, encompassing ligand-protein binding, molecular solubility, metabolism and toxicity. We show that the multi-view models perform robustly and are able to balance the strengths and weaknesses of specific views. We then apply this model to screen compounds against a large (>100 targets) set of G Protein-Coupled receptors (GPCRs). From this library of targets, we identify 33 that are related to Alzheimer's disease. On this subset, we employ our model to identify strong binders, which are validated through structure-based modeling and identification of key binding motifs.</description><identifier>DOI: 10.48550/arxiv.2410.19704</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Learning ; Quantitative Biology - Biomolecules</subject><creationdate>2024-10</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2410.19704$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2410.19704$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Suryanarayanan, Parthasarathy</creatorcontrib><creatorcontrib>Qiu, Yunguang</creatorcontrib><creatorcontrib>Sethi, Shreyans</creatorcontrib><creatorcontrib>Mahajan, Diwakar</creatorcontrib><creatorcontrib>Li, Hongyang</creatorcontrib><creatorcontrib>Yang, Yuxin</creatorcontrib><creatorcontrib>Eyigoz, Elif</creatorcontrib><creatorcontrib>Saenz, Aldo Guzman</creatorcontrib><creatorcontrib>Platt, Daniel E</creatorcontrib><creatorcontrib>Rumbell, Timothy H</creatorcontrib><creatorcontrib>Ng, Kenney</creatorcontrib><creatorcontrib>Dey, Sanjoy</creatorcontrib><creatorcontrib>Burch, Myson</creatorcontrib><creatorcontrib>Kwon, Bum Chul</creatorcontrib><creatorcontrib>Meyer, Pablo</creatorcontrib><creatorcontrib>Cheng, Feixiong</creatorcontrib><creatorcontrib>Hu, Jianying</creatorcontrib><creatorcontrib>Morrone, Joseph A</creatorcontrib><title>Multi-view biomedical foundation models for molecule-target and property prediction</title><description>Foundation models applied to bio-molecular space hold promise to accelerate drug discovery. Molecular representation is key to building such models. Previous works have typically focused on a single representation or view of the molecules. Here, we develop a multi-view foundation model approach, that integrates molecular views of graph, image and text. Single-view foundation models are each pre-trained on a dataset of up to 200M molecules and then aggregated into combined representations. Our multi-view model is validated on a diverse set of 18 tasks, encompassing ligand-protein binding, molecular solubility, metabolism and toxicity. We show that the multi-view models perform robustly and are able to balance the strengths and weaknesses of specific views. We then apply this model to screen compounds against a large (>100 targets) set of G Protein-Coupled receptors (GPCRs). From this library of targets, we identify 33 that are related to Alzheimer's disease. On this subset, we employ our model to identify strong binders, which are validated through structure-based modeling and identification of key binding motifs.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Learning</subject><subject>Quantitative Biology - Biomolecules</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMgEKGFqaG5hwMgT7luaUZOqWZaaWKyRl5uempmQmJ-YopOWX5qUklmTm5ynk5qek5hQDRYqAzJzU5NKcVN2SxKL01BKFxLwUhYKi_ILUopJKIAOkF6SFh4E1LTGnOJUXSnMzyLu5hjh76IKtjy8oysxNLKqMBzkjHuwMY8IqAErtPsw</recordid><startdate>20241025</startdate><enddate>20241025</enddate><creator>Suryanarayanan, Parthasarathy</creator><creator>Qiu, Yunguang</creator><creator>Sethi, Shreyans</creator><creator>Mahajan, Diwakar</creator><creator>Li, Hongyang</creator><creator>Yang, Yuxin</creator><creator>Eyigoz, Elif</creator><creator>Saenz, Aldo Guzman</creator><creator>Platt, Daniel E</creator><creator>Rumbell, Timothy H</creator><creator>Ng, Kenney</creator><creator>Dey, Sanjoy</creator><creator>Burch, Myson</creator><creator>Kwon, Bum Chul</creator><creator>Meyer, Pablo</creator><creator>Cheng, Feixiong</creator><creator>Hu, Jianying</creator><creator>Morrone, Joseph A</creator><scope>AKY</scope><scope>ALC</scope><scope>GOX</scope></search><sort><creationdate>20241025</creationdate><title>Multi-view biomedical foundation models for molecule-target and property prediction</title><author>Suryanarayanan, Parthasarathy ; Qiu, Yunguang ; Sethi, Shreyans ; Mahajan, Diwakar ; Li, Hongyang ; Yang, Yuxin ; Eyigoz, Elif ; Saenz, Aldo Guzman ; Platt, Daniel E ; Rumbell, Timothy H ; Ng, Kenney ; Dey, Sanjoy ; Burch, Myson ; Kwon, Bum Chul ; Meyer, Pablo ; Cheng, Feixiong ; Hu, Jianying ; Morrone, Joseph A</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2410_197043</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Learning</topic><topic>Quantitative Biology - Biomolecules</topic><toplevel>online_resources</toplevel><creatorcontrib>Suryanarayanan, Parthasarathy</creatorcontrib><creatorcontrib>Qiu, Yunguang</creatorcontrib><creatorcontrib>Sethi, Shreyans</creatorcontrib><creatorcontrib>Mahajan, Diwakar</creatorcontrib><creatorcontrib>Li, Hongyang</creatorcontrib><creatorcontrib>Yang, Yuxin</creatorcontrib><creatorcontrib>Eyigoz, Elif</creatorcontrib><creatorcontrib>Saenz, Aldo Guzman</creatorcontrib><creatorcontrib>Platt, Daniel E</creatorcontrib><creatorcontrib>Rumbell, Timothy H</creatorcontrib><creatorcontrib>Ng, Kenney</creatorcontrib><creatorcontrib>Dey, Sanjoy</creatorcontrib><creatorcontrib>Burch, Myson</creatorcontrib><creatorcontrib>Kwon, Bum Chul</creatorcontrib><creatorcontrib>Meyer, Pablo</creatorcontrib><creatorcontrib>Cheng, Feixiong</creatorcontrib><creatorcontrib>Hu, Jianying</creatorcontrib><creatorcontrib>Morrone, Joseph A</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv Quantitative Biology</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Suryanarayanan, Parthasarathy</au><au>Qiu, Yunguang</au><au>Sethi, Shreyans</au><au>Mahajan, Diwakar</au><au>Li, Hongyang</au><au>Yang, Yuxin</au><au>Eyigoz, Elif</au><au>Saenz, Aldo Guzman</au><au>Platt, Daniel E</au><au>Rumbell, Timothy H</au><au>Ng, Kenney</au><au>Dey, Sanjoy</au><au>Burch, Myson</au><au>Kwon, Bum Chul</au><au>Meyer, Pablo</au><au>Cheng, Feixiong</au><au>Hu, Jianying</au><au>Morrone, Joseph A</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Multi-view biomedical foundation models for molecule-target and property prediction</atitle><date>2024-10-25</date><risdate>2024</risdate><abstract>Foundation models applied to bio-molecular space hold promise to accelerate drug discovery. Molecular representation is key to building such models. Previous works have typically focused on a single representation or view of the molecules. Here, we develop a multi-view foundation model approach, that integrates molecular views of graph, image and text. Single-view foundation models are each pre-trained on a dataset of up to 200M molecules and then aggregated into combined representations. Our multi-view model is validated on a diverse set of 18 tasks, encompassing ligand-protein binding, molecular solubility, metabolism and toxicity. We show that the multi-view models perform robustly and are able to balance the strengths and weaknesses of specific views. We then apply this model to screen compounds against a large (>100 targets) set of G Protein-Coupled receptors (GPCRs). From this library of targets, we identify 33 that are related to Alzheimer's disease. On this subset, we employ our model to identify strong binders, which are validated through structure-based modeling and identification of key binding motifs.</abstract><doi>10.48550/arxiv.2410.19704</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2410.19704
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2410_19704
source	arXiv.org
subjects	Computer Science - Artificial Intelligence Computer Science - Learning Quantitative Biology - Biomolecules
title	Multi-view biomedical foundation models for molecule-target and property prediction
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-02T05%3A11%3A21IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Multi-view%20biomedical%20foundation%20models%20for%20molecule-target%20and%20property%20prediction&rft.au=Suryanarayanan,%20Parthasarathy&rft.date=2024-10-25&rft_id=info:doi/10.48550/arxiv.2410.19704&rft_dat=%3Carxiv_GOX%3E2410_19704%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true