Exploring Chemical Space with Score-based Out-of-distribution Generation

A well-known limitation of existing molecular generative models is that the generated molecules highly resemble those in the training set. To generate truly novel molecules that may have even better properties for de novo drug discovery, more powerful exploration in the chemical space is necessary....

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Lee, Seul, Jo, Jaehyeong, Hwang, Sung Ju
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Lee, Seul
Jo, Jaehyeong
Hwang, Sung Ju
description A well-known limitation of existing molecular generative models is that the generated molecules highly resemble those in the training set. To generate truly novel molecules that may have even better properties for de novo drug discovery, more powerful exploration in the chemical space is necessary. To this end, we propose Molecular Out-Of-distribution Diffusion(MOOD), a score-based diffusion scheme that incorporates out-of-distribution (OOD) control in the generative stochastic differential equation (SDE) with simple control of a hyperparameter, thus requires no additional costs. Since some novel molecules may not meet the basic requirements of real-world drugs, MOOD performs conditional generation by utilizing the gradients from a property predictor that guides the reverse-time diffusion process to high-scoring regions according to target properties such as protein-ligand interactions, drug-likeness, and synthesizability. This allows MOOD to search for novel and meaningful molecules rather than generating unseen yet trivial ones. We experimentally validate that MOOD is able to explore the chemical space beyond the training distribution, generating molecules that outscore ones found with existing methods, and even the top 0.01% of the original training pool. Our code is available at https://github.com/SeulLee05/MOOD.
doi_str_mv 10.48550/arxiv.2206.07632
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2206_07632</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2206_07632</sourcerecordid><originalsourceid>FETCH-LOGICAL-a672-535962ef6e723ddaeabae9402fb4f832289a3f5923ef011841ab96e7a739bc1a3</originalsourceid><addsrcrecordid>eNotz7FOwzAUBVAvDKjlA5jwDzjYz7ETj1VUWqRKHdo9ek6eqaU0iZwUyt9DC9O9w9WVDmPPSmZ5aYx8xXSNnxmAtJksrIZHtl1fx25Isf_g1YnOscGOH0ZsiH_F-cQPzZBIeJyo5fvLLIYg2jjNKfrLHIeeb6inhLe6ZA8Bu4me_nPBjm_rY7UVu_3mvVrtBNoChNHGWaBgqQDdtkjokVwuIfg8lBqgdKiDcaApSKXKXKF3v2MstPONQr1gL3-3d0o9pnjG9F3fSPWdpH8As71G7A</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Exploring Chemical Space with Score-based Out-of-distribution Generation</title><source>arXiv.org</source><creator>Lee, Seul ; Jo, Jaehyeong ; Hwang, Sung Ju</creator><creatorcontrib>Lee, Seul ; Jo, Jaehyeong ; Hwang, Sung Ju</creatorcontrib><description>A well-known limitation of existing molecular generative models is that the generated molecules highly resemble those in the training set. To generate truly novel molecules that may have even better properties for de novo drug discovery, more powerful exploration in the chemical space is necessary. To this end, we propose Molecular Out-Of-distribution Diffusion(MOOD), a score-based diffusion scheme that incorporates out-of-distribution (OOD) control in the generative stochastic differential equation (SDE) with simple control of a hyperparameter, thus requires no additional costs. Since some novel molecules may not meet the basic requirements of real-world drugs, MOOD performs conditional generation by utilizing the gradients from a property predictor that guides the reverse-time diffusion process to high-scoring regions according to target properties such as protein-ligand interactions, drug-likeness, and synthesizability. This allows MOOD to search for novel and meaningful molecules rather than generating unseen yet trivial ones. We experimentally validate that MOOD is able to explore the chemical space beyond the training distribution, generating molecules that outscore ones found with existing methods, and even the top 0.01% of the original training pool. Our code is available at https://github.com/SeulLee05/MOOD.</description><identifier>DOI: 10.48550/arxiv.2206.07632</identifier><language>eng</language><subject>Computer Science - Learning ; Physics - Chemical Physics ; Quantitative Biology - Biomolecules</subject><creationdate>2022-06</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2206.07632$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2206.07632$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Lee, Seul</creatorcontrib><creatorcontrib>Jo, Jaehyeong</creatorcontrib><creatorcontrib>Hwang, Sung Ju</creatorcontrib><title>Exploring Chemical Space with Score-based Out-of-distribution Generation</title><description>A well-known limitation of existing molecular generative models is that the generated molecules highly resemble those in the training set. To generate truly novel molecules that may have even better properties for de novo drug discovery, more powerful exploration in the chemical space is necessary. To this end, we propose Molecular Out-Of-distribution Diffusion(MOOD), a score-based diffusion scheme that incorporates out-of-distribution (OOD) control in the generative stochastic differential equation (SDE) with simple control of a hyperparameter, thus requires no additional costs. Since some novel molecules may not meet the basic requirements of real-world drugs, MOOD performs conditional generation by utilizing the gradients from a property predictor that guides the reverse-time diffusion process to high-scoring regions according to target properties such as protein-ligand interactions, drug-likeness, and synthesizability. This allows MOOD to search for novel and meaningful molecules rather than generating unseen yet trivial ones. We experimentally validate that MOOD is able to explore the chemical space beyond the training distribution, generating molecules that outscore ones found with existing methods, and even the top 0.01% of the original training pool. Our code is available at https://github.com/SeulLee05/MOOD.</description><subject>Computer Science - Learning</subject><subject>Physics - Chemical Physics</subject><subject>Quantitative Biology - Biomolecules</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotz7FOwzAUBVAvDKjlA5jwDzjYz7ETj1VUWqRKHdo9ek6eqaU0iZwUyt9DC9O9w9WVDmPPSmZ5aYx8xXSNnxmAtJksrIZHtl1fx25Isf_g1YnOscGOH0ZsiH_F-cQPzZBIeJyo5fvLLIYg2jjNKfrLHIeeb6inhLe6ZA8Bu4me_nPBjm_rY7UVu_3mvVrtBNoChNHGWaBgqQDdtkjokVwuIfg8lBqgdKiDcaApSKXKXKF3v2MstPONQr1gL3-3d0o9pnjG9F3fSPWdpH8As71G7A</recordid><startdate>20220606</startdate><enddate>20220606</enddate><creator>Lee, Seul</creator><creator>Jo, Jaehyeong</creator><creator>Hwang, Sung Ju</creator><scope>AKY</scope><scope>ALC</scope><scope>GOX</scope></search><sort><creationdate>20220606</creationdate><title>Exploring Chemical Space with Score-based Out-of-distribution Generation</title><author>Lee, Seul ; Jo, Jaehyeong ; Hwang, Sung Ju</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a672-535962ef6e723ddaeabae9402fb4f832289a3f5923ef011841ab96e7a739bc1a3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Computer Science - Learning</topic><topic>Physics - Chemical Physics</topic><topic>Quantitative Biology - Biomolecules</topic><toplevel>online_resources</toplevel><creatorcontrib>Lee, Seul</creatorcontrib><creatorcontrib>Jo, Jaehyeong</creatorcontrib><creatorcontrib>Hwang, Sung Ju</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv Quantitative Biology</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Lee, Seul</au><au>Jo, Jaehyeong</au><au>Hwang, Sung Ju</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Exploring Chemical Space with Score-based Out-of-distribution Generation</atitle><date>2022-06-06</date><risdate>2022</risdate><abstract>A well-known limitation of existing molecular generative models is that the generated molecules highly resemble those in the training set. To generate truly novel molecules that may have even better properties for de novo drug discovery, more powerful exploration in the chemical space is necessary. To this end, we propose Molecular Out-Of-distribution Diffusion(MOOD), a score-based diffusion scheme that incorporates out-of-distribution (OOD) control in the generative stochastic differential equation (SDE) with simple control of a hyperparameter, thus requires no additional costs. Since some novel molecules may not meet the basic requirements of real-world drugs, MOOD performs conditional generation by utilizing the gradients from a property predictor that guides the reverse-time diffusion process to high-scoring regions according to target properties such as protein-ligand interactions, drug-likeness, and synthesizability. This allows MOOD to search for novel and meaningful molecules rather than generating unseen yet trivial ones. We experimentally validate that MOOD is able to explore the chemical space beyond the training distribution, generating molecules that outscore ones found with existing methods, and even the top 0.01% of the original training pool. Our code is available at https://github.com/SeulLee05/MOOD.</abstract><doi>10.48550/arxiv.2206.07632</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2206.07632
ispartof
issn
language eng
recordid cdi_arxiv_primary_2206_07632
source arXiv.org
subjects Computer Science - Learning
Physics - Chemical Physics
Quantitative Biology - Biomolecules
title Exploring Chemical Space with Score-based Out-of-distribution Generation
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-14T08%3A23%3A03IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Exploring%20Chemical%20Space%20with%20Score-based%20Out-of-distribution%20Generation&rft.au=Lee,%20Seul&rft.date=2022-06-06&rft_id=info:doi/10.48550/arxiv.2206.07632&rft_dat=%3Carxiv_GOX%3E2206_07632%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true