Assessing Synthetic Accessibility of Chemical Compounds Using Machine Learning Methods

With de novo rational drug design, scientists can rapidly generate a very large number of potentially biologically active probes. However, many of them may be synthetically infeasible and, therefore, of limited value to drug developers. On the other hand, most of the tools for synthetic accessibilit...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of chemical information and modeling 2010-06, Vol.50 (6), p.979-991
Hauptverfasser: Podolyan, Yevgeniy, Walters, Michael A, Karypis, George
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 991
container_issue 6
container_start_page 979
container_title Journal of chemical information and modeling
container_volume 50
creator Podolyan, Yevgeniy
Walters, Michael A
Karypis, George
description With de novo rational drug design, scientists can rapidly generate a very large number of potentially biologically active probes. However, many of them may be synthetically infeasible and, therefore, of limited value to drug developers. On the other hand, most of the tools for synthetic accessibility evaluation are very slow and can process only a few molecules per minute. In this study, we present two approaches to quickly predict the synthetic accessibility of chemical compounds by utilizing support vector machines operating on molecular descriptors. The first approach, RSsvm, is designed to identify the compounds that can be synthesized using a specific set of reactions and starting materials and builds its model by training on the compounds identified as synthetically accessible or not by retrosynthetic analysis. The second approach, DRsvm, is designed to provide a more general assessment of synthetic accessibility that is not tied to any set of reactions or starting materials. The training set compounds for this approach are selected from a diverse library based on the number of other similar compounds within the same library. Both approaches have been shown to perform very well in their corresponding areas of applicability with the RSsvm achieving a receiver operator characteristic score of 0.952 in cross-validation experiments and the DRsvm achieving a score of 0.888 on an independent set of compounds. Our implementations can successfully process thousands of compounds per minute.
doi_str_mv 10.1021/ci900301v
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_733458616</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2073582801</sourcerecordid><originalsourceid>FETCH-LOGICAL-a437t-13def0c960209cc4488266f6b3ced5ccebc8fa6ff6695710ca5b3bce62e6a1553</originalsourceid><addsrcrecordid>eNpl0FtLwzAYBuAgipvTC_-AFEHEi2nSNGlzOYonmHjhAe9K-jVxGT3Mfq2wf2_mpgO9SghPvrx5CTlm9JLRkF2BU5Ryyj53yJCJSI2VpG-7P3uh5IAcIM694UqG-2QQUsElU2xIXieIBtHV78HTsu5mpnMQTABWZ7krXbcMGhukM1M50GWQNtWi6esCg5fvOw8aZq42wdTotv4-MN2sKfCQ7FldojnarCPycnP9nN6Np4-39-lkOtYRj7sx44WxFHzckCqAKEqSUEorcw6mED5FDonV0loplYgZBS1ynoORoZGaCcFH5Hw9d9E2H73BLqscgilLXZumxyzmPBKJZNLL0z9y3vRt7cNlIk48Eknk0cUaQdsgtsZmi9ZVul1mjGarqrPfqr092Qzs88oUv_KnWw_ONkCj7862ugaHWxeq2P8g3joNuA31_8EvVkeSNA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>578458584</pqid></control><display><type>article</type><title>Assessing Synthetic Accessibility of Chemical Compounds Using Machine Learning Methods</title><source>MEDLINE</source><source>American Chemical Society Journals</source><creator>Podolyan, Yevgeniy ; Walters, Michael A ; Karypis, George</creator><creatorcontrib>Podolyan, Yevgeniy ; Walters, Michael A ; Karypis, George</creatorcontrib><description>With de novo rational drug design, scientists can rapidly generate a very large number of potentially biologically active probes. However, many of them may be synthetically infeasible and, therefore, of limited value to drug developers. On the other hand, most of the tools for synthetic accessibility evaluation are very slow and can process only a few molecules per minute. In this study, we present two approaches to quickly predict the synthetic accessibility of chemical compounds by utilizing support vector machines operating on molecular descriptors. The first approach, RSsvm, is designed to identify the compounds that can be synthesized using a specific set of reactions and starting materials and builds its model by training on the compounds identified as synthetically accessible or not by retrosynthetic analysis. The second approach, DRsvm, is designed to provide a more general assessment of synthetic accessibility that is not tied to any set of reactions or starting materials. The training set compounds for this approach are selected from a diverse library based on the number of other similar compounds within the same library. Both approaches have been shown to perform very well in their corresponding areas of applicability with the RSsvm achieving a receiver operator characteristic score of 0.952 in cross-validation experiments and the DRsvm achieving a score of 0.888 on an independent set of compounds. Our implementations can successfully process thousands of compounds per minute.</description><identifier>ISSN: 1549-9596</identifier><identifier>EISSN: 1549-960X</identifier><identifier>DOI: 10.1021/ci900301v</identifier><identifier>PMID: 20536191</identifier><language>eng</language><publisher>Washington, DC: American Chemical Society</publisher><subject>Applied sciences ; Artificial Intelligence ; Chemical compounds ; Chemical Information ; Chemical reactions ; Chemical synthesis ; Computer science; control theory; systems ; Drug Design ; Exact sciences and technology ; Molecular chemistry ; Molecules ; Reproducibility of Results ; ROC Curve ; Small Molecule Libraries - chemical synthesis ; Small Molecule Libraries - chemistry ; Studies</subject><ispartof>Journal of chemical information and modeling, 2010-06, Vol.50 (6), p.979-991</ispartof><rights>Copyright © 2010 American Chemical Society</rights><rights>2015 INIST-CNRS</rights><rights>Copyright American Chemical Society Jun 28, 2010</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-a437t-13def0c960209cc4488266f6b3ced5ccebc8fa6ff6695710ca5b3bce62e6a1553</citedby><cites>FETCH-LOGICAL-a437t-13def0c960209cc4488266f6b3ced5ccebc8fa6ff6695710ca5b3bce62e6a1553</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://pubs.acs.org/doi/pdf/10.1021/ci900301v$$EPDF$$P50$$Gacs$$H</linktopdf><linktohtml>$$Uhttps://pubs.acs.org/doi/10.1021/ci900301v$$EHTML$$P50$$Gacs$$H</linktohtml><link.rule.ids>314,780,784,2765,27076,27924,27925,56738,56788</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&amp;idt=22975537$$DView record in Pascal Francis$$Hfree_for_read</backlink><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/20536191$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Podolyan, Yevgeniy</creatorcontrib><creatorcontrib>Walters, Michael A</creatorcontrib><creatorcontrib>Karypis, George</creatorcontrib><title>Assessing Synthetic Accessibility of Chemical Compounds Using Machine Learning Methods</title><title>Journal of chemical information and modeling</title><addtitle>J. Chem. Inf. Model</addtitle><description>With de novo rational drug design, scientists can rapidly generate a very large number of potentially biologically active probes. However, many of them may be synthetically infeasible and, therefore, of limited value to drug developers. On the other hand, most of the tools for synthetic accessibility evaluation are very slow and can process only a few molecules per minute. In this study, we present two approaches to quickly predict the synthetic accessibility of chemical compounds by utilizing support vector machines operating on molecular descriptors. The first approach, RSsvm, is designed to identify the compounds that can be synthesized using a specific set of reactions and starting materials and builds its model by training on the compounds identified as synthetically accessible or not by retrosynthetic analysis. The second approach, DRsvm, is designed to provide a more general assessment of synthetic accessibility that is not tied to any set of reactions or starting materials. The training set compounds for this approach are selected from a diverse library based on the number of other similar compounds within the same library. Both approaches have been shown to perform very well in their corresponding areas of applicability with the RSsvm achieving a receiver operator characteristic score of 0.952 in cross-validation experiments and the DRsvm achieving a score of 0.888 on an independent set of compounds. Our implementations can successfully process thousands of compounds per minute.</description><subject>Applied sciences</subject><subject>Artificial Intelligence</subject><subject>Chemical compounds</subject><subject>Chemical Information</subject><subject>Chemical reactions</subject><subject>Chemical synthesis</subject><subject>Computer science; control theory; systems</subject><subject>Drug Design</subject><subject>Exact sciences and technology</subject><subject>Molecular chemistry</subject><subject>Molecules</subject><subject>Reproducibility of Results</subject><subject>ROC Curve</subject><subject>Small Molecule Libraries - chemical synthesis</subject><subject>Small Molecule Libraries - chemistry</subject><subject>Studies</subject><issn>1549-9596</issn><issn>1549-960X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2010</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNpl0FtLwzAYBuAgipvTC_-AFEHEi2nSNGlzOYonmHjhAe9K-jVxGT3Mfq2wf2_mpgO9SghPvrx5CTlm9JLRkF2BU5Ryyj53yJCJSI2VpG-7P3uh5IAcIM694UqG-2QQUsElU2xIXieIBtHV78HTsu5mpnMQTABWZ7krXbcMGhukM1M50GWQNtWi6esCg5fvOw8aZq42wdTotv4-MN2sKfCQ7FldojnarCPycnP9nN6Np4-39-lkOtYRj7sx44WxFHzckCqAKEqSUEorcw6mED5FDonV0loplYgZBS1ynoORoZGaCcFH5Hw9d9E2H73BLqscgilLXZumxyzmPBKJZNLL0z9y3vRt7cNlIk48Eknk0cUaQdsgtsZmi9ZVul1mjGarqrPfqr092Qzs88oUv_KnWw_ONkCj7862ugaHWxeq2P8g3joNuA31_8EvVkeSNA</recordid><startdate>20100628</startdate><enddate>20100628</enddate><creator>Podolyan, Yevgeniy</creator><creator>Walters, Michael A</creator><creator>Karypis, George</creator><general>American Chemical Society</general><scope>IQODW</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SR</scope><scope>7U5</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>7X8</scope></search><sort><creationdate>20100628</creationdate><title>Assessing Synthetic Accessibility of Chemical Compounds Using Machine Learning Methods</title><author>Podolyan, Yevgeniy ; Walters, Michael A ; Karypis, George</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a437t-13def0c960209cc4488266f6b3ced5ccebc8fa6ff6695710ca5b3bce62e6a1553</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2010</creationdate><topic>Applied sciences</topic><topic>Artificial Intelligence</topic><topic>Chemical compounds</topic><topic>Chemical Information</topic><topic>Chemical reactions</topic><topic>Chemical synthesis</topic><topic>Computer science; control theory; systems</topic><topic>Drug Design</topic><topic>Exact sciences and technology</topic><topic>Molecular chemistry</topic><topic>Molecules</topic><topic>Reproducibility of Results</topic><topic>ROC Curve</topic><topic>Small Molecule Libraries - chemical synthesis</topic><topic>Small Molecule Libraries - chemistry</topic><topic>Studies</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Podolyan, Yevgeniy</creatorcontrib><creatorcontrib>Walters, Michael A</creatorcontrib><creatorcontrib>Karypis, George</creatorcontrib><collection>Pascal-Francis</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>MEDLINE - Academic</collection><jtitle>Journal of chemical information and modeling</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Podolyan, Yevgeniy</au><au>Walters, Michael A</au><au>Karypis, George</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Assessing Synthetic Accessibility of Chemical Compounds Using Machine Learning Methods</atitle><jtitle>Journal of chemical information and modeling</jtitle><addtitle>J. Chem. Inf. Model</addtitle><date>2010-06-28</date><risdate>2010</risdate><volume>50</volume><issue>6</issue><spage>979</spage><epage>991</epage><pages>979-991</pages><issn>1549-9596</issn><eissn>1549-960X</eissn><abstract>With de novo rational drug design, scientists can rapidly generate a very large number of potentially biologically active probes. However, many of them may be synthetically infeasible and, therefore, of limited value to drug developers. On the other hand, most of the tools for synthetic accessibility evaluation are very slow and can process only a few molecules per minute. In this study, we present two approaches to quickly predict the synthetic accessibility of chemical compounds by utilizing support vector machines operating on molecular descriptors. The first approach, RSsvm, is designed to identify the compounds that can be synthesized using a specific set of reactions and starting materials and builds its model by training on the compounds identified as synthetically accessible or not by retrosynthetic analysis. The second approach, DRsvm, is designed to provide a more general assessment of synthetic accessibility that is not tied to any set of reactions or starting materials. The training set compounds for this approach are selected from a diverse library based on the number of other similar compounds within the same library. Both approaches have been shown to perform very well in their corresponding areas of applicability with the RSsvm achieving a receiver operator characteristic score of 0.952 in cross-validation experiments and the DRsvm achieving a score of 0.888 on an independent set of compounds. Our implementations can successfully process thousands of compounds per minute.</abstract><cop>Washington, DC</cop><pub>American Chemical Society</pub><pmid>20536191</pmid><doi>10.1021/ci900301v</doi><tpages>13</tpages></addata></record>
fulltext fulltext
identifier ISSN: 1549-9596
ispartof Journal of chemical information and modeling, 2010-06, Vol.50 (6), p.979-991
issn 1549-9596
1549-960X
language eng
recordid cdi_proquest_miscellaneous_733458616
source MEDLINE; American Chemical Society Journals
subjects Applied sciences
Artificial Intelligence
Chemical compounds
Chemical Information
Chemical reactions
Chemical synthesis
Computer science
control theory
systems
Drug Design
Exact sciences and technology
Molecular chemistry
Molecules
Reproducibility of Results
ROC Curve
Small Molecule Libraries - chemical synthesis
Small Molecule Libraries - chemistry
Studies
title Assessing Synthetic Accessibility of Chemical Compounds Using Machine Learning Methods
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-28T22%3A25%3A37IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Assessing%20Synthetic%20Accessibility%20of%20Chemical%20Compounds%20Using%20Machine%20Learning%20Methods&rft.jtitle=Journal%20of%20chemical%20information%20and%20modeling&rft.au=Podolyan,%20Yevgeniy&rft.date=2010-06-28&rft.volume=50&rft.issue=6&rft.spage=979&rft.epage=991&rft.pages=979-991&rft.issn=1549-9596&rft.eissn=1549-960X&rft_id=info:doi/10.1021/ci900301v&rft_dat=%3Cproquest_cross%3E2073582801%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=578458584&rft_id=info:pmid/20536191&rfr_iscdi=true