Automated Generation of Novel Fragments Using Screening Data, a Dual SMILES Autoencoder, Transfer Learning and Syntax Correction

Fragment-based hit identification (FBHI) allows proportionately greater coverage of chemical space using fewer molecules than traditional high-throughput screening approaches. However, effectively exploiting this advantage is highly dependent on the library design. Solubility, stability, chemical co...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Journal of chemical information and modeling 2021-06, Vol.61 (6), p.2547-2559
Hauptverfasser:	Bilsland, Alan E, McAulay, Kirsten, West, Ryan, Pugliese, Angelo, Bower, Justin
Format:	Artikel
Sprache:	eng
Schlagworte:	Aromatic compounds Artificial neural networks Automation Chemical fingerprinting Chemists Coders Complexity Computer architecture Fragments Learning Libraries Machine Learning Machine Learning and Deep Learning Neural networks Neural Networks, Computer Particle swarm optimization Recurrent neural networks Representations Screening Syntax Training
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	2559
container_issue	6
container_start_page	2547
container_title	Journal of chemical information and modeling
container_volume	61
creator	Bilsland, Alan E McAulay, Kirsten West, Ryan Pugliese, Angelo Bower, Justin
description	Fragment-based hit identification (FBHI) allows proportionately greater coverage of chemical space using fewer molecules than traditional high-throughput screening approaches. However, effectively exploiting this advantage is highly dependent on the library design. Solubility, stability, chemical complexity, chemical/shape diversity, and synthetic tractability for fragment elaboration are all critical aspects, and molecule design remains a time-consuming task for computational and medicinal chemists. Artificial neural networks have attracted considerable attention in automated de novo design applications and could also prove useful for fragment library design. Chemical autoencoders are neural networks consisting of encoder and decoder parts, which respectively compress and decompress molecular representations. The decoder is applied to samples drawn from the space of compressed representations to generate novel molecules that can be scored for properties of interest. Here, we report an autoencoder model using a recurrent neural network architecture, which was trained using 486,565 fragments curated from commercial sources, to simultaneously reconstruct both SMILES and chemical fingerprints. To explore its utility in fragment design, we applied transfer learning to the fingerprint decoder layers to train a classifier using 66 frequent hitter fragments identified from our screening campaigns. Using a particle swarm optimization sampling approach, we compare the performance of this “dual” model to an architecture encoding SMILES only. The dual model produced valid SMILES with improved features, considering a range of properties including aromatic ring counts, heavy atom count, synthetic accessibility, and a new fragment complexity score we term Feature Complexity (FeCo). Additionally, we demonstrate that generative performance is further enhanced by use of a simple syntax-correction procedure during training, in which invalid and undesirable SMILES are spiked into the training set. Finally, we used the syntax-corrected model to generate a library of novel candidate privileged fragments.
doi_str_mv	10.1021/acs.jcim.0c01226
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_pubmed_primary_34029470</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2548505417</sourcerecordid><originalsourceid>FETCH-LOGICAL-a360t-2b7be7b3e2658892af270b1a409003f4a5ad43503c4915053d8bb25ce98763ca3</originalsourceid><addsrcrecordid>eNp1kT1v2zAQhomiQZO42TsVBLp0sF1-SuIYOJ-A0w5OgGzCiToFMiQyJaWg2frTS8V2hwKdeMPzPsfDS8gnzpacCf4NbFxubdsvmWVciOwdOeFamYXJ2OP7w6xNdkxOY9wyJqXJxAdyLBUTRuXshPw-Hwffw4A1vUaHAYbWO-ob-t2_YEevAjz16IZIH2LrnujGBkQ3TRcwwJwCvRiho5u72_Xlhk4udNbXGOb0PoCLDQa6RghvEXA13by6AX7RlQ8B7bTrIzlqoIt4tn9n5OHq8n51s1j_uL5dna8XIDM2LESVV5hXEkWmi8IIaETOKg6KmXRWo0BDraRm0irDNdOyLqpKaIumyDNpQc7I1533OfifI8ah7NtosevAoR9jKbQUQrM80TPy5R9068fg0u8SpYpkVzxPFNtRNvgYAzblc2h7CK8lZ-XUTpnaKad2yn07KfJ5Lx6rHuu_gUMdCZjvgLfoYel_fX8AAQWaSg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2548505417</pqid></control><display><type>article</type><title>Automated Generation of Novel Fragments Using Screening Data, a Dual SMILES Autoencoder, Transfer Learning and Syntax Correction</title><source>ACS Publications</source><source>MEDLINE</source><creator>Bilsland, Alan E ; McAulay, Kirsten ; West, Ryan ; Pugliese, Angelo ; Bower, Justin</creator><creatorcontrib>Bilsland, Alan E ; McAulay, Kirsten ; West, Ryan ; Pugliese, Angelo ; Bower, Justin</creatorcontrib><description>Fragment-based hit identification (FBHI) allows proportionately greater coverage of chemical space using fewer molecules than traditional high-throughput screening approaches. However, effectively exploiting this advantage is highly dependent on the library design. Solubility, stability, chemical complexity, chemical/shape diversity, and synthetic tractability for fragment elaboration are all critical aspects, and molecule design remains a time-consuming task for computational and medicinal chemists. Artificial neural networks have attracted considerable attention in automated de novo design applications and could also prove useful for fragment library design. Chemical autoencoders are neural networks consisting of encoder and decoder parts, which respectively compress and decompress molecular representations. The decoder is applied to samples drawn from the space of compressed representations to generate novel molecules that can be scored for properties of interest. Here, we report an autoencoder model using a recurrent neural network architecture, which was trained using 486,565 fragments curated from commercial sources, to simultaneously reconstruct both SMILES and chemical fingerprints. To explore its utility in fragment design, we applied transfer learning to the fingerprint decoder layers to train a classifier using 66 frequent hitter fragments identified from our screening campaigns. Using a particle swarm optimization sampling approach, we compare the performance of this “dual” model to an architecture encoding SMILES only. The dual model produced valid SMILES with improved features, considering a range of properties including aromatic ring counts, heavy atom count, synthetic accessibility, and a new fragment complexity score we term Feature Complexity (FeCo). Additionally, we demonstrate that generative performance is further enhanced by use of a simple syntax-correction procedure during training, in which invalid and undesirable SMILES are spiked into the training set. Finally, we used the syntax-corrected model to generate a library of novel candidate privileged fragments.</description><identifier>ISSN: 1549-9596</identifier><identifier>EISSN: 1549-960X</identifier><identifier>DOI: 10.1021/acs.jcim.0c01226</identifier><identifier>PMID: 34029470</identifier><language>eng</language><publisher>United States: American Chemical Society</publisher><subject>Aromatic compounds ; Artificial neural networks ; Automation ; Chemical fingerprinting ; Chemists ; Coders ; Complexity ; Computer architecture ; Fragments ; Learning ; Libraries ; Machine Learning ; Machine Learning and Deep Learning ; Neural networks ; Neural Networks, Computer ; Particle swarm optimization ; Recurrent neural networks ; Representations ; Screening ; Syntax ; Training</subject><ispartof>Journal of chemical information and modeling, 2021-06, Vol.61 (6), p.2547-2559</ispartof><rights>2021 American Chemical Society</rights><rights>Copyright American Chemical Society Jun 28, 2021</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-a360t-2b7be7b3e2658892af270b1a409003f4a5ad43503c4915053d8bb25ce98763ca3</citedby><cites>FETCH-LOGICAL-a360t-2b7be7b3e2658892af270b1a409003f4a5ad43503c4915053d8bb25ce98763ca3</cites><orcidid>0000-0003-0957-3908</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://pubs.acs.org/doi/pdf/10.1021/acs.jcim.0c01226$$EPDF$$P50$$Gacs$$H</linktopdf><linktohtml>$$Uhttps://pubs.acs.org/doi/10.1021/acs.jcim.0c01226$$EHTML$$P50$$Gacs$$H</linktohtml><link.rule.ids>314,776,780,2752,27053,27901,27902,56713,56763</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/34029470$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Bilsland, Alan E</creatorcontrib><creatorcontrib>McAulay, Kirsten</creatorcontrib><creatorcontrib>West, Ryan</creatorcontrib><creatorcontrib>Pugliese, Angelo</creatorcontrib><creatorcontrib>Bower, Justin</creatorcontrib><title>Automated Generation of Novel Fragments Using Screening Data, a Dual SMILES Autoencoder, Transfer Learning and Syntax Correction</title><title>Journal of chemical information and modeling</title><addtitle>J. Chem. Inf. Model</addtitle><description>Fragment-based hit identification (FBHI) allows proportionately greater coverage of chemical space using fewer molecules than traditional high-throughput screening approaches. However, effectively exploiting this advantage is highly dependent on the library design. Solubility, stability, chemical complexity, chemical/shape diversity, and synthetic tractability for fragment elaboration are all critical aspects, and molecule design remains a time-consuming task for computational and medicinal chemists. Artificial neural networks have attracted considerable attention in automated de novo design applications and could also prove useful for fragment library design. Chemical autoencoders are neural networks consisting of encoder and decoder parts, which respectively compress and decompress molecular representations. The decoder is applied to samples drawn from the space of compressed representations to generate novel molecules that can be scored for properties of interest. Here, we report an autoencoder model using a recurrent neural network architecture, which was trained using 486,565 fragments curated from commercial sources, to simultaneously reconstruct both SMILES and chemical fingerprints. To explore its utility in fragment design, we applied transfer learning to the fingerprint decoder layers to train a classifier using 66 frequent hitter fragments identified from our screening campaigns. Using a particle swarm optimization sampling approach, we compare the performance of this “dual” model to an architecture encoding SMILES only. The dual model produced valid SMILES with improved features, considering a range of properties including aromatic ring counts, heavy atom count, synthetic accessibility, and a new fragment complexity score we term Feature Complexity (FeCo). Additionally, we demonstrate that generative performance is further enhanced by use of a simple syntax-correction procedure during training, in which invalid and undesirable SMILES are spiked into the training set. Finally, we used the syntax-corrected model to generate a library of novel candidate privileged fragments.</description><subject>Aromatic compounds</subject><subject>Artificial neural networks</subject><subject>Automation</subject><subject>Chemical fingerprinting</subject><subject>Chemists</subject><subject>Coders</subject><subject>Complexity</subject><subject>Computer architecture</subject><subject>Fragments</subject><subject>Learning</subject><subject>Libraries</subject><subject>Machine Learning</subject><subject>Machine Learning and Deep Learning</subject><subject>Neural networks</subject><subject>Neural Networks, Computer</subject><subject>Particle swarm optimization</subject><subject>Recurrent neural networks</subject><subject>Representations</subject><subject>Screening</subject><subject>Syntax</subject><subject>Training</subject><issn>1549-9596</issn><issn>1549-960X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNp1kT1v2zAQhomiQZO42TsVBLp0sF1-SuIYOJ-A0w5OgGzCiToFMiQyJaWg2frTS8V2hwKdeMPzPsfDS8gnzpacCf4NbFxubdsvmWVciOwdOeFamYXJ2OP7w6xNdkxOY9wyJqXJxAdyLBUTRuXshPw-Hwffw4A1vUaHAYbWO-ob-t2_YEevAjz16IZIH2LrnujGBkQ3TRcwwJwCvRiho5u72_Xlhk4udNbXGOb0PoCLDQa6RghvEXA13by6AX7RlQ8B7bTrIzlqoIt4tn9n5OHq8n51s1j_uL5dna8XIDM2LESVV5hXEkWmi8IIaETOKg6KmXRWo0BDraRm0irDNdOyLqpKaIumyDNpQc7I1533OfifI8ah7NtosevAoR9jKbQUQrM80TPy5R9068fg0u8SpYpkVzxPFNtRNvgYAzblc2h7CK8lZ-XUTpnaKad2yn07KfJ5Lx6rHuu_gUMdCZjvgLfoYel_fX8AAQWaSg</recordid><startdate>20210628</startdate><enddate>20210628</enddate><creator>Bilsland, Alan E</creator><creator>McAulay, Kirsten</creator><creator>West, Ryan</creator><creator>Pugliese, Angelo</creator><creator>Bower, Justin</creator><general>American Chemical Society</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SR</scope><scope>7U5</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0003-0957-3908</orcidid></search><sort><creationdate>20210628</creationdate><title>Automated Generation of Novel Fragments Using Screening Data, a Dual SMILES Autoencoder, Transfer Learning and Syntax Correction</title><author>Bilsland, Alan E ; McAulay, Kirsten ; West, Ryan ; Pugliese, Angelo ; Bower, Justin</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a360t-2b7be7b3e2658892af270b1a409003f4a5ad43503c4915053d8bb25ce98763ca3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Aromatic compounds</topic><topic>Artificial neural networks</topic><topic>Automation</topic><topic>Chemical fingerprinting</topic><topic>Chemists</topic><topic>Coders</topic><topic>Complexity</topic><topic>Computer architecture</topic><topic>Fragments</topic><topic>Learning</topic><topic>Libraries</topic><topic>Machine Learning</topic><topic>Machine Learning and Deep Learning</topic><topic>Neural networks</topic><topic>Neural Networks, Computer</topic><topic>Particle swarm optimization</topic><topic>Recurrent neural networks</topic><topic>Representations</topic><topic>Screening</topic><topic>Syntax</topic><topic>Training</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Bilsland, Alan E</creatorcontrib><creatorcontrib>McAulay, Kirsten</creatorcontrib><creatorcontrib>West, Ryan</creatorcontrib><creatorcontrib>Pugliese, Angelo</creatorcontrib><creatorcontrib>Bower, Justin</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>MEDLINE - Academic</collection><jtitle>Journal of chemical information and modeling</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Bilsland, Alan E</au><au>McAulay, Kirsten</au><au>West, Ryan</au><au>Pugliese, Angelo</au><au>Bower, Justin</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Automated Generation of Novel Fragments Using Screening Data, a Dual SMILES Autoencoder, Transfer Learning and Syntax Correction</atitle><jtitle>Journal of chemical information and modeling</jtitle><addtitle>J. Chem. Inf. Model</addtitle><date>2021-06-28</date><risdate>2021</risdate><volume>61</volume><issue>6</issue><spage>2547</spage><epage>2559</epage><pages>2547-2559</pages><issn>1549-9596</issn><eissn>1549-960X</eissn><abstract>Fragment-based hit identification (FBHI) allows proportionately greater coverage of chemical space using fewer molecules than traditional high-throughput screening approaches. However, effectively exploiting this advantage is highly dependent on the library design. Solubility, stability, chemical complexity, chemical/shape diversity, and synthetic tractability for fragment elaboration are all critical aspects, and molecule design remains a time-consuming task for computational and medicinal chemists. Artificial neural networks have attracted considerable attention in automated de novo design applications and could also prove useful for fragment library design. Chemical autoencoders are neural networks consisting of encoder and decoder parts, which respectively compress and decompress molecular representations. The decoder is applied to samples drawn from the space of compressed representations to generate novel molecules that can be scored for properties of interest. Here, we report an autoencoder model using a recurrent neural network architecture, which was trained using 486,565 fragments curated from commercial sources, to simultaneously reconstruct both SMILES and chemical fingerprints. To explore its utility in fragment design, we applied transfer learning to the fingerprint decoder layers to train a classifier using 66 frequent hitter fragments identified from our screening campaigns. Using a particle swarm optimization sampling approach, we compare the performance of this “dual” model to an architecture encoding SMILES only. The dual model produced valid SMILES with improved features, considering a range of properties including aromatic ring counts, heavy atom count, synthetic accessibility, and a new fragment complexity score we term Feature Complexity (FeCo). Additionally, we demonstrate that generative performance is further enhanced by use of a simple syntax-correction procedure during training, in which invalid and undesirable SMILES are spiked into the training set. Finally, we used the syntax-corrected model to generate a library of novel candidate privileged fragments.</abstract><cop>United States</cop><pub>American Chemical Society</pub><pmid>34029470</pmid><doi>10.1021/acs.jcim.0c01226</doi><tpages>13</tpages><orcidid>https://orcid.org/0000-0003-0957-3908</orcidid></addata></record>
fulltext	fulltext
identifier	ISSN: 1549-9596
ispartof	Journal of chemical information and modeling, 2021-06, Vol.61 (6), p.2547-2559
issn	1549-9596 1549-960X
language	eng
recordid	cdi_pubmed_primary_34029470
source	ACS Publications; MEDLINE
subjects	Aromatic compounds Artificial neural networks Automation Chemical fingerprinting Chemists Coders Complexity Computer architecture Fragments Learning Libraries Machine Learning Machine Learning and Deep Learning Neural networks Neural Networks, Computer Particle swarm optimization Recurrent neural networks Representations Screening Syntax Training
title	Automated Generation of Novel Fragments Using Screening Data, a Dual SMILES Autoencoder, Transfer Learning and Syntax Correction
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-01T20%3A35%3A33IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Automated%20Generation%20of%20Novel%20Fragments%20Using%20Screening%20Data,%20a%20Dual%20SMILES%20Autoencoder,%20Transfer%20Learning%20and%20Syntax%20Correction&rft.jtitle=Journal%20of%20chemical%20information%20and%20modeling&rft.au=Bilsland,%20Alan%20E&rft.date=2021-06-28&rft.volume=61&rft.issue=6&rft.spage=2547&rft.epage=2559&rft.pages=2547-2559&rft.issn=1549-9596&rft.eissn=1549-960X&rft_id=info:doi/10.1021/acs.jcim.0c01226&rft_dat=%3Cproquest_cross%3E2548505417%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2548505417&rft_id=info:pmid/34029470&rfr_iscdi=true