Potential Drugs and Nondrugs:  Prediction and Identification of Important Structural Features

Using decision trees, a model to discriminate between potential drugs and nondrugs has been developed. Compounds from the Available Chemical Directory and the World Drug Index databases were used as training set; the molecular structures were represented using extended atom types. The error rate on...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of Chemical Information and Computer Sciences 2000-03, Vol.40 (2), p.280-292
Hauptverfasser: Wagener, Markus, van Geerestein, Vincent J
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 292
container_issue 2
container_start_page 280
container_title Journal of Chemical Information and Computer Sciences
container_volume 40
creator Wagener, Markus
van Geerestein, Vincent J
description Using decision trees, a model to discriminate between potential drugs and nondrugs has been developed. Compounds from the Available Chemical Directory and the World Drug Index databases were used as training set; the molecular structures were represented using extended atom types. The error rate on an independent validation data set is 17.4%. The number of false negatives can be reduced by penalizing the misclassification of drugs so that 92 out of 100 potential drugs are correctly recognized. At the same time, 34 out of 100 nondrugs are classified as potential drugs. The predictions of the model can be used to guide the purchase or selection of compounds for biological screening or the design of combinatorial libraries. The visualization of the generated models in the form of colored trees allowed us to identify a few, surprisingly simple features that explain the most significant differences between drugs and nondrugs in the training set:  Just by testing the presence of hydroxyl, tertiary or secondary amino, carboxyl, phenol, or enol groups, already three quarters of all drugs could be correctly recognized. The nondrugs, on the other hand, are characterized by their aromatic nature with a low content of functional groups besides halogens. The general applicability of the model is shown by the predictions made for several Organon databases.
doi_str_mv 10.1021/ci990266t
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1859317748</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1859317748</sourcerecordid><originalsourceid>FETCH-LOGICAL-a283t-efad7be682355a3d7d7b1b24c7089835bfaf5c09655fefa5ee7dcbb939004ec3</originalsourceid><addsrcrecordid>eNpt0EFPwjAYBuDGaATRg3_A7GKih2m70nb1ZkCEhCgRDtyaruvMEFZsu0RvXv2b_hILI8SDp_br9-Rt8gJwjuANggm6VSXnMKHUH4A2Il0ecwrnh6ANISdxgnHaAifOLSDEmNPkGLQQZBShhLeBmBivK1_KZdS39auLZJVHT6bKN8Pdz9d3NLE6L5UvTbXdjfINL0olt0-miEartbFeVj6aelsrX9sQNtAyXLQ7BUeFXDp9tjs7YDZ4mPWG8fj5cdS7H8cySbGPdSFzlmmaJpgQiXMWJpQlXcVgylNMskIWREFOCSmCJVqzXGUZxxzCrla4A66a2LU177V2XqxKp_RyKSttaidQSjhGjHXTQK8bqqxxzupCrG25kvZTICg2dYp9ncFe7GLrbKXzP7LpL4C4AaXz-mO_l_ZNUIYZEbPJVLz05gPcp0gMg79svFROLExtq1DKPx__Aodmjbw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1859317748</pqid></control><display><type>article</type><title>Potential Drugs and Nondrugs:  Prediction and Identification of Important Structural Features</title><source>ACS Publications</source><creator>Wagener, Markus ; van Geerestein, Vincent J</creator><creatorcontrib>Wagener, Markus ; van Geerestein, Vincent J</creatorcontrib><description>Using decision trees, a model to discriminate between potential drugs and nondrugs has been developed. Compounds from the Available Chemical Directory and the World Drug Index databases were used as training set; the molecular structures were represented using extended atom types. The error rate on an independent validation data set is 17.4%. The number of false negatives can be reduced by penalizing the misclassification of drugs so that 92 out of 100 potential drugs are correctly recognized. At the same time, 34 out of 100 nondrugs are classified as potential drugs. The predictions of the model can be used to guide the purchase or selection of compounds for biological screening or the design of combinatorial libraries. The visualization of the generated models in the form of colored trees allowed us to identify a few, surprisingly simple features that explain the most significant differences between drugs and nondrugs in the training set:  Just by testing the presence of hydroxyl, tertiary or secondary amino, carboxyl, phenol, or enol groups, already three quarters of all drugs could be correctly recognized. The nondrugs, on the other hand, are characterized by their aromatic nature with a low content of functional groups besides halogens. The general applicability of the model is shown by the predictions made for several Organon databases.</description><identifier>ISSN: 0095-2338</identifier><identifier>EISSN: 1549-960X</identifier><identifier>DOI: 10.1021/ci990266t</identifier><identifier>PMID: 10761129</identifier><language>eng</language><publisher>United States: American Chemical Society</publisher><ispartof>Journal of Chemical Information and Computer Sciences, 2000-03, Vol.40 (2), p.280-292</ispartof><rights>Copyright © 2000 American Chemical Society</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-a283t-efad7be682355a3d7d7b1b24c7089835bfaf5c09655fefa5ee7dcbb939004ec3</citedby><cites>FETCH-LOGICAL-a283t-efad7be682355a3d7d7b1b24c7089835bfaf5c09655fefa5ee7dcbb939004ec3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://pubs.acs.org/doi/pdf/10.1021/ci990266t$$EPDF$$P50$$Gacs$$H</linktopdf><linktohtml>$$Uhttps://pubs.acs.org/doi/10.1021/ci990266t$$EHTML$$P50$$Gacs$$H</linktohtml><link.rule.ids>314,780,784,2765,27076,27924,27925,56738,56788</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/10761129$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Wagener, Markus</creatorcontrib><creatorcontrib>van Geerestein, Vincent J</creatorcontrib><title>Potential Drugs and Nondrugs:  Prediction and Identification of Important Structural Features</title><title>Journal of Chemical Information and Computer Sciences</title><addtitle>J. Chem. Inf. Comput. Sci</addtitle><description>Using decision trees, a model to discriminate between potential drugs and nondrugs has been developed. Compounds from the Available Chemical Directory and the World Drug Index databases were used as training set; the molecular structures were represented using extended atom types. The error rate on an independent validation data set is 17.4%. The number of false negatives can be reduced by penalizing the misclassification of drugs so that 92 out of 100 potential drugs are correctly recognized. At the same time, 34 out of 100 nondrugs are classified as potential drugs. The predictions of the model can be used to guide the purchase or selection of compounds for biological screening or the design of combinatorial libraries. The visualization of the generated models in the form of colored trees allowed us to identify a few, surprisingly simple features that explain the most significant differences between drugs and nondrugs in the training set:  Just by testing the presence of hydroxyl, tertiary or secondary amino, carboxyl, phenol, or enol groups, already three quarters of all drugs could be correctly recognized. The nondrugs, on the other hand, are characterized by their aromatic nature with a low content of functional groups besides halogens. The general applicability of the model is shown by the predictions made for several Organon databases.</description><issn>0095-2338</issn><issn>1549-960X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2000</creationdate><recordtype>article</recordtype><recordid>eNpt0EFPwjAYBuDGaATRg3_A7GKih2m70nb1ZkCEhCgRDtyaruvMEFZsu0RvXv2b_hILI8SDp_br9-Rt8gJwjuANggm6VSXnMKHUH4A2Il0ecwrnh6ANISdxgnHaAifOLSDEmNPkGLQQZBShhLeBmBivK1_KZdS39auLZJVHT6bKN8Pdz9d3NLE6L5UvTbXdjfINL0olt0-miEartbFeVj6aelsrX9sQNtAyXLQ7BUeFXDp9tjs7YDZ4mPWG8fj5cdS7H8cySbGPdSFzlmmaJpgQiXMWJpQlXcVgylNMskIWREFOCSmCJVqzXGUZxxzCrla4A66a2LU177V2XqxKp_RyKSttaidQSjhGjHXTQK8bqqxxzupCrG25kvZTICg2dYp9ncFe7GLrbKXzP7LpL4C4AaXz-mO_l_ZNUIYZEbPJVLz05gPcp0gMg79svFROLExtq1DKPx__Aodmjbw</recordid><startdate>20000301</startdate><enddate>20000301</enddate><creator>Wagener, Markus</creator><creator>van Geerestein, Vincent J</creator><general>American Chemical Society</general><scope>BSCLL</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope></search><sort><creationdate>20000301</creationdate><title>Potential Drugs and Nondrugs:  Prediction and Identification of Important Structural Features</title><author>Wagener, Markus ; van Geerestein, Vincent J</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a283t-efad7be682355a3d7d7b1b24c7089835bfaf5c09655fefa5ee7dcbb939004ec3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2000</creationdate><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Wagener, Markus</creatorcontrib><creatorcontrib>van Geerestein, Vincent J</creatorcontrib><collection>Istex</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><jtitle>Journal of Chemical Information and Computer Sciences</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Wagener, Markus</au><au>van Geerestein, Vincent J</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Potential Drugs and Nondrugs:  Prediction and Identification of Important Structural Features</atitle><jtitle>Journal of Chemical Information and Computer Sciences</jtitle><addtitle>J. Chem. Inf. Comput. Sci</addtitle><date>2000-03-01</date><risdate>2000</risdate><volume>40</volume><issue>2</issue><spage>280</spage><epage>292</epage><pages>280-292</pages><issn>0095-2338</issn><eissn>1549-960X</eissn><abstract>Using decision trees, a model to discriminate between potential drugs and nondrugs has been developed. Compounds from the Available Chemical Directory and the World Drug Index databases were used as training set; the molecular structures were represented using extended atom types. The error rate on an independent validation data set is 17.4%. The number of false negatives can be reduced by penalizing the misclassification of drugs so that 92 out of 100 potential drugs are correctly recognized. At the same time, 34 out of 100 nondrugs are classified as potential drugs. The predictions of the model can be used to guide the purchase or selection of compounds for biological screening or the design of combinatorial libraries. The visualization of the generated models in the form of colored trees allowed us to identify a few, surprisingly simple features that explain the most significant differences between drugs and nondrugs in the training set:  Just by testing the presence of hydroxyl, tertiary or secondary amino, carboxyl, phenol, or enol groups, already three quarters of all drugs could be correctly recognized. The nondrugs, on the other hand, are characterized by their aromatic nature with a low content of functional groups besides halogens. The general applicability of the model is shown by the predictions made for several Organon databases.</abstract><cop>United States</cop><pub>American Chemical Society</pub><pmid>10761129</pmid><doi>10.1021/ci990266t</doi><tpages>13</tpages></addata></record>
fulltext fulltext
identifier ISSN: 0095-2338
ispartof Journal of Chemical Information and Computer Sciences, 2000-03, Vol.40 (2), p.280-292
issn 0095-2338
1549-960X
language eng
recordid cdi_proquest_miscellaneous_1859317748
source ACS Publications
title Potential Drugs and Nondrugs:  Prediction and Identification of Important Structural Features
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-04T06%3A22%3A21IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Potential%20Drugs%20and%20Nondrugs:%E2%80%89%20Prediction%20and%20Identification%20of%20Important%20Structural%20Features&rft.jtitle=Journal%20of%20Chemical%20Information%20and%20Computer%20Sciences&rft.au=Wagener,%20Markus&rft.date=2000-03-01&rft.volume=40&rft.issue=2&rft.spage=280&rft.epage=292&rft.pages=280-292&rft.issn=0095-2338&rft.eissn=1549-960X&rft_id=info:doi/10.1021/ci990266t&rft_dat=%3Cproquest_cross%3E1859317748%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1859317748&rft_id=info:pmid/10761129&rfr_iscdi=true