ToDD: Topological Compound Fingerprinting in Computer-Aided Drug Discovery

In computer-aided drug discovery (CADD), virtual screening (VS) is used for identifying the drug candidates that are most likely to bind to a molecular target in a large library of compounds. Most VS methods to date have focused on using canonical compound representations (e.g., SMILES strings, Morg...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Demir, Andac, Coskunuzer, Baris, Segovia-Dominguez, Ignacio, Chen, Yuzhou, Gel, Yulia, Kiziltan, Bulent
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Artificial Intelligence Computer Science - Learning Quantitative Biology - Quantitative Methods
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Demir, Andac Coskunuzer, Baris Segovia-Dominguez, Ignacio Chen, Yuzhou Gel, Yulia Kiziltan, Bulent
description	In computer-aided drug discovery (CADD), virtual screening (VS) is used for identifying the drug candidates that are most likely to bind to a molecular target in a large library of compounds. Most VS methods to date have focused on using canonical compound representations (e.g., SMILES strings, Morgan fingerprints) or generating alternative fingerprints of the compounds by training progressively more complex variational autoencoders (VAEs) and graph neural networks (GNNs). Although VAEs and GNNs led to significant improvements in VS performance, these methods suffer from reduced performance when scaling to large virtual compound datasets. The performance of these methods has shown only incremental improvements in the past few years. To address this problem, we developed a novel method using multiparameter persistence (MP) homology that produces topological fingerprints of the compounds as multidimensional vectors. Our primary contribution is framing the VS process as a new topology-based graph ranking problem by partitioning a compound into chemical substructures informed by the periodic properties of its atoms and extracting their persistent homology features at multiple resolution levels. We show that the margin loss fine-tuning of pretrained Triplet networks attains highly competitive results in differentiating between compounds in the embedding space and ranking their likelihood of becoming effective drug candidates. We further establish theoretical guarantees for the stability properties of our proposed MP signatures, and demonstrate that our models, enhanced by the MP signatures, outperform state-of-the-art methods on benchmark datasets by a wide and highly statistically significant margin (e.g., 93% gain for Cleves-Jain and 54% gain for DUD-E Diverse dataset).
doi_str_mv	10.48550/arxiv.2211.03808
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2211_03808</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2211_03808</sourcerecordid><originalsourceid>FETCH-LOGICAL-a678-a35356dcca102143cd05ba051919a76fea62455d38f75d08cc9397e28aa6d1963</originalsourceid><addsrcrecordid>eNotj8tqwzAURLXpoqT5gK6qH7Crh68sdRfspk0JdOO9uZVkI3Aso8Sh-fukblYzMHCYQ8gzZ3mhAdgrpt9wzoXgPGdSM_1IvppY12-0iVMcYh8sDrSKhynOo6PbMPY-TSmMp1ujYVym-eRTtgnOO1qnuad1ONp49unyRB46HI5-fc8VabbvTfWZ7b8_dtVmn6EqdYYSJChnLXImeCGtY_CDDLjhBkvVeVSiAHBSdyU4pq010pReaETluFFyRV7-sYtMe7t3wHRp_6TaRUpeAf0wRsU</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>ToDD: Topological Compound Fingerprinting in Computer-Aided Drug Discovery</title><source>arXiv.org</source><creator>Demir, Andac ; Coskunuzer, Baris ; Segovia-Dominguez, Ignacio ; Chen, Yuzhou ; Gel, Yulia ; Kiziltan, Bulent</creator><creatorcontrib>Demir, Andac ; Coskunuzer, Baris ; Segovia-Dominguez, Ignacio ; Chen, Yuzhou ; Gel, Yulia ; Kiziltan, Bulent</creatorcontrib><description>In computer-aided drug discovery (CADD), virtual screening (VS) is used for identifying the drug candidates that are most likely to bind to a molecular target in a large library of compounds. Most VS methods to date have focused on using canonical compound representations (e.g., SMILES strings, Morgan fingerprints) or generating alternative fingerprints of the compounds by training progressively more complex variational autoencoders (VAEs) and graph neural networks (GNNs). Although VAEs and GNNs led to significant improvements in VS performance, these methods suffer from reduced performance when scaling to large virtual compound datasets. The performance of these methods has shown only incremental improvements in the past few years. To address this problem, we developed a novel method using multiparameter persistence (MP) homology that produces topological fingerprints of the compounds as multidimensional vectors. Our primary contribution is framing the VS process as a new topology-based graph ranking problem by partitioning a compound into chemical substructures informed by the periodic properties of its atoms and extracting their persistent homology features at multiple resolution levels. We show that the margin loss fine-tuning of pretrained Triplet networks attains highly competitive results in differentiating between compounds in the embedding space and ranking their likelihood of becoming effective drug candidates. We further establish theoretical guarantees for the stability properties of our proposed MP signatures, and demonstrate that our models, enhanced by the MP signatures, outperform state-of-the-art methods on benchmark datasets by a wide and highly statistically significant margin (e.g., 93% gain for Cleves-Jain and 54% gain for DUD-E Diverse dataset).</description><identifier>DOI: 10.48550/arxiv.2211.03808</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Learning ; Quantitative Biology - Quantitative Methods</subject><creationdate>2022-11</creationdate><rights>http://creativecommons.org/licenses/by-nc-sa/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2211.03808$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2211.03808$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Demir, Andac</creatorcontrib><creatorcontrib>Coskunuzer, Baris</creatorcontrib><creatorcontrib>Segovia-Dominguez, Ignacio</creatorcontrib><creatorcontrib>Chen, Yuzhou</creatorcontrib><creatorcontrib>Gel, Yulia</creatorcontrib><creatorcontrib>Kiziltan, Bulent</creatorcontrib><title>ToDD: Topological Compound Fingerprinting in Computer-Aided Drug Discovery</title><description>In computer-aided drug discovery (CADD), virtual screening (VS) is used for identifying the drug candidates that are most likely to bind to a molecular target in a large library of compounds. Most VS methods to date have focused on using canonical compound representations (e.g., SMILES strings, Morgan fingerprints) or generating alternative fingerprints of the compounds by training progressively more complex variational autoencoders (VAEs) and graph neural networks (GNNs). Although VAEs and GNNs led to significant improvements in VS performance, these methods suffer from reduced performance when scaling to large virtual compound datasets. The performance of these methods has shown only incremental improvements in the past few years. To address this problem, we developed a novel method using multiparameter persistence (MP) homology that produces topological fingerprints of the compounds as multidimensional vectors. Our primary contribution is framing the VS process as a new topology-based graph ranking problem by partitioning a compound into chemical substructures informed by the periodic properties of its atoms and extracting their persistent homology features at multiple resolution levels. We show that the margin loss fine-tuning of pretrained Triplet networks attains highly competitive results in differentiating between compounds in the embedding space and ranking their likelihood of becoming effective drug candidates. We further establish theoretical guarantees for the stability properties of our proposed MP signatures, and demonstrate that our models, enhanced by the MP signatures, outperform state-of-the-art methods on benchmark datasets by a wide and highly statistically significant margin (e.g., 93% gain for Cleves-Jain and 54% gain for DUD-E Diverse dataset).</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Learning</subject><subject>Quantitative Biology - Quantitative Methods</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj8tqwzAURLXpoqT5gK6qH7Crh68sdRfspk0JdOO9uZVkI3Aso8Sh-fukblYzMHCYQ8gzZ3mhAdgrpt9wzoXgPGdSM_1IvppY12-0iVMcYh8sDrSKhynOo6PbMPY-TSmMp1ujYVym-eRTtgnOO1qnuad1ONp49unyRB46HI5-fc8VabbvTfWZ7b8_dtVmn6EqdYYSJChnLXImeCGtY_CDDLjhBkvVeVSiAHBSdyU4pq010pReaETluFFyRV7-sYtMe7t3wHRp_6TaRUpeAf0wRsU</recordid><startdate>20221107</startdate><enddate>20221107</enddate><creator>Demir, Andac</creator><creator>Coskunuzer, Baris</creator><creator>Segovia-Dominguez, Ignacio</creator><creator>Chen, Yuzhou</creator><creator>Gel, Yulia</creator><creator>Kiziltan, Bulent</creator><scope>AKY</scope><scope>ALC</scope><scope>GOX</scope></search><sort><creationdate>20221107</creationdate><title>ToDD: Topological Compound Fingerprinting in Computer-Aided Drug Discovery</title><author>Demir, Andac ; Coskunuzer, Baris ; Segovia-Dominguez, Ignacio ; Chen, Yuzhou ; Gel, Yulia ; Kiziltan, Bulent</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a678-a35356dcca102143cd05ba051919a76fea62455d38f75d08cc9397e28aa6d1963</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Learning</topic><topic>Quantitative Biology - Quantitative Methods</topic><toplevel>online_resources</toplevel><creatorcontrib>Demir, Andac</creatorcontrib><creatorcontrib>Coskunuzer, Baris</creatorcontrib><creatorcontrib>Segovia-Dominguez, Ignacio</creatorcontrib><creatorcontrib>Chen, Yuzhou</creatorcontrib><creatorcontrib>Gel, Yulia</creatorcontrib><creatorcontrib>Kiziltan, Bulent</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv Quantitative Biology</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Demir, Andac</au><au>Coskunuzer, Baris</au><au>Segovia-Dominguez, Ignacio</au><au>Chen, Yuzhou</au><au>Gel, Yulia</au><au>Kiziltan, Bulent</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>ToDD: Topological Compound Fingerprinting in Computer-Aided Drug Discovery</atitle><date>2022-11-07</date><risdate>2022</risdate><abstract>In computer-aided drug discovery (CADD), virtual screening (VS) is used for identifying the drug candidates that are most likely to bind to a molecular target in a large library of compounds. Most VS methods to date have focused on using canonical compound representations (e.g., SMILES strings, Morgan fingerprints) or generating alternative fingerprints of the compounds by training progressively more complex variational autoencoders (VAEs) and graph neural networks (GNNs). Although VAEs and GNNs led to significant improvements in VS performance, these methods suffer from reduced performance when scaling to large virtual compound datasets. The performance of these methods has shown only incremental improvements in the past few years. To address this problem, we developed a novel method using multiparameter persistence (MP) homology that produces topological fingerprints of the compounds as multidimensional vectors. Our primary contribution is framing the VS process as a new topology-based graph ranking problem by partitioning a compound into chemical substructures informed by the periodic properties of its atoms and extracting their persistent homology features at multiple resolution levels. We show that the margin loss fine-tuning of pretrained Triplet networks attains highly competitive results in differentiating between compounds in the embedding space and ranking their likelihood of becoming effective drug candidates. We further establish theoretical guarantees for the stability properties of our proposed MP signatures, and demonstrate that our models, enhanced by the MP signatures, outperform state-of-the-art methods on benchmark datasets by a wide and highly statistically significant margin (e.g., 93% gain for Cleves-Jain and 54% gain for DUD-E Diverse dataset).</abstract><doi>10.48550/arxiv.2211.03808</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2211.03808
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2211_03808
source	arXiv.org
subjects	Computer Science - Artificial Intelligence Computer Science - Learning Quantitative Biology - Quantitative Methods
title	ToDD: Topological Compound Fingerprinting in Computer-Aided Drug Discovery
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-19T02%3A28%3A52IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=ToDD:%20Topological%20Compound%20Fingerprinting%20in%20Computer-Aided%20Drug%20Discovery&rft.au=Demir,%20Andac&rft.date=2022-11-07&rft_id=info:doi/10.48550/arxiv.2211.03808&rft_dat=%3Carxiv_GOX%3E2211_03808%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true