Perturbation-Theory and Machine Learning (PTML) Model for High-Throughput Screening of Parham Reactions: Experimental and Theoretical Studies

Machine learning (ML) algorithms are gaining importance in the processing of chemical information and modeling of chemical reactivity problems. In this work, we have developed a perturbation-theory and machine learning (PTML) model combining perturbation theory (PT) and ML algorithms for predicting...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of chemical information and modeling 2018-07, Vol.58 (7), p.1384-1396
Hauptverfasser: Simón-Vidal, Lorena, García-Calvo, Oihane, Oteo, Uxue, Arrasate, Sonia, Lete, Esther, Sotomayor, Nuria, González-Díaz, Humberto
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1396
container_issue 7
container_start_page 1384
container_title Journal of chemical information and modeling
container_volume 58
creator Simón-Vidal, Lorena
García-Calvo, Oihane
Oteo, Uxue
Arrasate, Sonia
Lete, Esther
Sotomayor, Nuria
González-Díaz, Humberto
description Machine learning (ML) algorithms are gaining importance in the processing of chemical information and modeling of chemical reactivity problems. In this work, we have developed a perturbation-theory and machine learning (PTML) model combining perturbation theory (PT) and ML algorithms for predicting the yield of a given reaction. For this purpose, we have selected Parham cyclization, which is a general and powerful tool for the synthesis of heterocyclic and carbocyclic compounds. This reaction has both structural (substitution pattern on the substrate, internal electrophile, ring size, etc.) and operational variables (organolithium reagent, solvent, temperature, time, etc.), so predicting the effect of changes on substrate design (internal elelctrophile, halide, etc.) or reaction conditions on the yield is an important task that could help to optimize the reaction design. The PTML model developed uses PT operators to account for perturbations under experimental conditions and/or structural variables of all the molecules involved in a query reaction, compared to a reaction of reference. Thus, a dataset of >100 reactions has been collected for different substrates and internal electrophiles, under different reaction conditions, with a wide range of yields (0–98%). The best PTML model found using General Linear Regression (GLR) has R = 0.88 in training and R = 0.83 in external validation series for 10 000 pairs of query and reference reactions. The PTML model has a final R = 0.95 for all reactions using multiple reactions of reference. We also report a comparative study of linear versus nonlinear PTML models based on artificial neural network (ANN) algorithms. PTML-ANN models (LNN, MLP, RBF) with R ≈ 0.1–0.8 do not outperform the first PMTL model. This result confirms the validity of the linearity of the model. Next, we carried out an experimental and theoretical study of nonreported Parham reactions to illustrate the practical use of the PTML model. A 500 000-point simulation and a Hammett analysis of the reactivity space of Parham reactions are also reported.
doi_str_mv 10.1021/acs.jcim.8b00286
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_2055614082</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2055614082</sourcerecordid><originalsourceid>FETCH-LOGICAL-a443t-3303664d13b6405cf129f6d15b070fb1731f91898eb22bb839b92c2441370dc3</originalsourceid><addsrcrecordid>eNp1kUuP0zAUhSMEYh6wZ4UssRkkUq7txI3ZodHAILWiYrpgZ9nOTeMqiYudSMyP4D-P-2KBxOr6St85x1cny95QmFFg9KO2cba1rp9VBoBV4ll2SctC5lLAz-fndynFRXYV4xaAcynYy-yCyUpWXMBl9meFYZyC0aPzQ75u0YdHooeaLLVt3YBkgToMbtiQm9V6uXhPlr7GjjQ-kHu3aZMi-GnT7qaRPNiAeEB9Q1Y6tLonP1DbvXP8RO5-7zC4HodRd4eEQxiOzqb9YZxqh_FV9qLRXcTXp3mdrb_crW_v88X3r99uPy9yXRR8zDkHLkRRU25EAaVtKJONqGlpYA6NoXNOG0nTiWgYM6bi0khmWVFQPofa8uvs5mi7C_7XhHFUvYsWu04P6KeoGJSloAVULKHv_kG3fgpD-pxilAGdC1HSRMGRssHHGLBRu3SqDo-Kgto3pVJTat-UOjWVJG9PxpPpsf4rOFeTgA9H4CA9h_7X7wkcaZ8f</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2120176651</pqid></control><display><type>article</type><title>Perturbation-Theory and Machine Learning (PTML) Model for High-Throughput Screening of Parham Reactions: Experimental and Theoretical Studies</title><source>American Chemical Society Publications</source><creator>Simón-Vidal, Lorena ; García-Calvo, Oihane ; Oteo, Uxue ; Arrasate, Sonia ; Lete, Esther ; Sotomayor, Nuria ; González-Díaz, Humberto</creator><creatorcontrib>Simón-Vidal, Lorena ; García-Calvo, Oihane ; Oteo, Uxue ; Arrasate, Sonia ; Lete, Esther ; Sotomayor, Nuria ; González-Díaz, Humberto</creatorcontrib><description>Machine learning (ML) algorithms are gaining importance in the processing of chemical information and modeling of chemical reactivity problems. In this work, we have developed a perturbation-theory and machine learning (PTML) model combining perturbation theory (PT) and ML algorithms for predicting the yield of a given reaction. For this purpose, we have selected Parham cyclization, which is a general and powerful tool for the synthesis of heterocyclic and carbocyclic compounds. This reaction has both structural (substitution pattern on the substrate, internal electrophile, ring size, etc.) and operational variables (organolithium reagent, solvent, temperature, time, etc.), so predicting the effect of changes on substrate design (internal elelctrophile, halide, etc.) or reaction conditions on the yield is an important task that could help to optimize the reaction design. The PTML model developed uses PT operators to account for perturbations under experimental conditions and/or structural variables of all the molecules involved in a query reaction, compared to a reaction of reference. Thus, a dataset of &gt;100 reactions has been collected for different substrates and internal electrophiles, under different reaction conditions, with a wide range of yields (0–98%). The best PTML model found using General Linear Regression (GLR) has R = 0.88 in training and R = 0.83 in external validation series for 10 000 pairs of query and reference reactions. The PTML model has a final R = 0.95 for all reactions using multiple reactions of reference. We also report a comparative study of linear versus nonlinear PTML models based on artificial neural network (ANN) algorithms. PTML-ANN models (LNN, MLP, RBF) with R ≈ 0.1–0.8 do not outperform the first PMTL model. This result confirms the validity of the linearity of the model. Next, we carried out an experimental and theoretical study of nonreported Parham reactions to illustrate the practical use of the PTML model. A 500 000-point simulation and a Hammett analysis of the reactivity space of Parham reactions are also reported.</description><identifier>ISSN: 1549-9596</identifier><identifier>EISSN: 1549-960X</identifier><identifier>DOI: 10.1021/acs.jcim.8b00286</identifier><identifier>PMID: 29898360</identifier><language>eng</language><publisher>United States: American Chemical Society</publisher><subject>Algorithms ; Artificial intelligence ; Artificial neural networks ; Chemical synthesis ; Comparative studies ; Computer simulation ; Design optimization ; Learning theory ; Linearity ; Machine learning ; Mathematical models ; Organic chemistry ; Perturbation methods ; Perturbation theory ; Reagents ; Regression analysis ; Substitution reactions ; Substrates ; Theory</subject><ispartof>Journal of chemical information and modeling, 2018-07, Vol.58 (7), p.1384-1396</ispartof><rights>Copyright American Chemical Society Jul 23, 2018</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-a443t-3303664d13b6405cf129f6d15b070fb1731f91898eb22bb839b92c2441370dc3</citedby><cites>FETCH-LOGICAL-a443t-3303664d13b6405cf129f6d15b070fb1731f91898eb22bb839b92c2441370dc3</cites><orcidid>0000-0001-8624-6842 ; 0000-0002-9392-2797 ; 0000-0003-3079-6380</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://pubs.acs.org/doi/pdf/10.1021/acs.jcim.8b00286$$EPDF$$P50$$Gacs$$H</linktopdf><linktohtml>$$Uhttps://pubs.acs.org/doi/10.1021/acs.jcim.8b00286$$EHTML$$P50$$Gacs$$H</linktohtml><link.rule.ids>314,780,784,2765,27076,27924,27925,56738,56788</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/29898360$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Simón-Vidal, Lorena</creatorcontrib><creatorcontrib>García-Calvo, Oihane</creatorcontrib><creatorcontrib>Oteo, Uxue</creatorcontrib><creatorcontrib>Arrasate, Sonia</creatorcontrib><creatorcontrib>Lete, Esther</creatorcontrib><creatorcontrib>Sotomayor, Nuria</creatorcontrib><creatorcontrib>González-Díaz, Humberto</creatorcontrib><title>Perturbation-Theory and Machine Learning (PTML) Model for High-Throughput Screening of Parham Reactions: Experimental and Theoretical Studies</title><title>Journal of chemical information and modeling</title><addtitle>J. Chem. Inf. Model</addtitle><description>Machine learning (ML) algorithms are gaining importance in the processing of chemical information and modeling of chemical reactivity problems. In this work, we have developed a perturbation-theory and machine learning (PTML) model combining perturbation theory (PT) and ML algorithms for predicting the yield of a given reaction. For this purpose, we have selected Parham cyclization, which is a general and powerful tool for the synthesis of heterocyclic and carbocyclic compounds. This reaction has both structural (substitution pattern on the substrate, internal electrophile, ring size, etc.) and operational variables (organolithium reagent, solvent, temperature, time, etc.), so predicting the effect of changes on substrate design (internal elelctrophile, halide, etc.) or reaction conditions on the yield is an important task that could help to optimize the reaction design. The PTML model developed uses PT operators to account for perturbations under experimental conditions and/or structural variables of all the molecules involved in a query reaction, compared to a reaction of reference. Thus, a dataset of &gt;100 reactions has been collected for different substrates and internal electrophiles, under different reaction conditions, with a wide range of yields (0–98%). The best PTML model found using General Linear Regression (GLR) has R = 0.88 in training and R = 0.83 in external validation series for 10 000 pairs of query and reference reactions. The PTML model has a final R = 0.95 for all reactions using multiple reactions of reference. We also report a comparative study of linear versus nonlinear PTML models based on artificial neural network (ANN) algorithms. PTML-ANN models (LNN, MLP, RBF) with R ≈ 0.1–0.8 do not outperform the first PMTL model. This result confirms the validity of the linearity of the model. Next, we carried out an experimental and theoretical study of nonreported Parham reactions to illustrate the practical use of the PTML model. A 500 000-point simulation and a Hammett analysis of the reactivity space of Parham reactions are also reported.</description><subject>Algorithms</subject><subject>Artificial intelligence</subject><subject>Artificial neural networks</subject><subject>Chemical synthesis</subject><subject>Comparative studies</subject><subject>Computer simulation</subject><subject>Design optimization</subject><subject>Learning theory</subject><subject>Linearity</subject><subject>Machine learning</subject><subject>Mathematical models</subject><subject>Organic chemistry</subject><subject>Perturbation methods</subject><subject>Perturbation theory</subject><subject>Reagents</subject><subject>Regression analysis</subject><subject>Substitution reactions</subject><subject>Substrates</subject><subject>Theory</subject><issn>1549-9596</issn><issn>1549-960X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><recordid>eNp1kUuP0zAUhSMEYh6wZ4UssRkkUq7txI3ZodHAILWiYrpgZ9nOTeMqiYudSMyP4D-P-2KBxOr6St85x1cny95QmFFg9KO2cba1rp9VBoBV4ll2SctC5lLAz-fndynFRXYV4xaAcynYy-yCyUpWXMBl9meFYZyC0aPzQ75u0YdHooeaLLVt3YBkgToMbtiQm9V6uXhPlr7GjjQ-kHu3aZMi-GnT7qaRPNiAeEB9Q1Y6tLonP1DbvXP8RO5-7zC4HodRd4eEQxiOzqb9YZxqh_FV9qLRXcTXp3mdrb_crW_v88X3r99uPy9yXRR8zDkHLkRRU25EAaVtKJONqGlpYA6NoXNOG0nTiWgYM6bi0khmWVFQPofa8uvs5mi7C_7XhHFUvYsWu04P6KeoGJSloAVULKHv_kG3fgpD-pxilAGdC1HSRMGRssHHGLBRu3SqDo-Kgto3pVJTat-UOjWVJG9PxpPpsf4rOFeTgA9H4CA9h_7X7wkcaZ8f</recordid><startdate>20180723</startdate><enddate>20180723</enddate><creator>Simón-Vidal, Lorena</creator><creator>García-Calvo, Oihane</creator><creator>Oteo, Uxue</creator><creator>Arrasate, Sonia</creator><creator>Lete, Esther</creator><creator>Sotomayor, Nuria</creator><creator>González-Díaz, Humberto</creator><general>American Chemical Society</general><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SR</scope><scope>7U5</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0001-8624-6842</orcidid><orcidid>https://orcid.org/0000-0002-9392-2797</orcidid><orcidid>https://orcid.org/0000-0003-3079-6380</orcidid></search><sort><creationdate>20180723</creationdate><title>Perturbation-Theory and Machine Learning (PTML) Model for High-Throughput Screening of Parham Reactions: Experimental and Theoretical Studies</title><author>Simón-Vidal, Lorena ; García-Calvo, Oihane ; Oteo, Uxue ; Arrasate, Sonia ; Lete, Esther ; Sotomayor, Nuria ; González-Díaz, Humberto</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a443t-3303664d13b6405cf129f6d15b070fb1731f91898eb22bb839b92c2441370dc3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Algorithms</topic><topic>Artificial intelligence</topic><topic>Artificial neural networks</topic><topic>Chemical synthesis</topic><topic>Comparative studies</topic><topic>Computer simulation</topic><topic>Design optimization</topic><topic>Learning theory</topic><topic>Linearity</topic><topic>Machine learning</topic><topic>Mathematical models</topic><topic>Organic chemistry</topic><topic>Perturbation methods</topic><topic>Perturbation theory</topic><topic>Reagents</topic><topic>Regression analysis</topic><topic>Substitution reactions</topic><topic>Substrates</topic><topic>Theory</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Simón-Vidal, Lorena</creatorcontrib><creatorcontrib>García-Calvo, Oihane</creatorcontrib><creatorcontrib>Oteo, Uxue</creatorcontrib><creatorcontrib>Arrasate, Sonia</creatorcontrib><creatorcontrib>Lete, Esther</creatorcontrib><creatorcontrib>Sotomayor, Nuria</creatorcontrib><creatorcontrib>González-Díaz, Humberto</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>MEDLINE - Academic</collection><jtitle>Journal of chemical information and modeling</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Simón-Vidal, Lorena</au><au>García-Calvo, Oihane</au><au>Oteo, Uxue</au><au>Arrasate, Sonia</au><au>Lete, Esther</au><au>Sotomayor, Nuria</au><au>González-Díaz, Humberto</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Perturbation-Theory and Machine Learning (PTML) Model for High-Throughput Screening of Parham Reactions: Experimental and Theoretical Studies</atitle><jtitle>Journal of chemical information and modeling</jtitle><addtitle>J. Chem. Inf. Model</addtitle><date>2018-07-23</date><risdate>2018</risdate><volume>58</volume><issue>7</issue><spage>1384</spage><epage>1396</epage><pages>1384-1396</pages><issn>1549-9596</issn><eissn>1549-960X</eissn><abstract>Machine learning (ML) algorithms are gaining importance in the processing of chemical information and modeling of chemical reactivity problems. In this work, we have developed a perturbation-theory and machine learning (PTML) model combining perturbation theory (PT) and ML algorithms for predicting the yield of a given reaction. For this purpose, we have selected Parham cyclization, which is a general and powerful tool for the synthesis of heterocyclic and carbocyclic compounds. This reaction has both structural (substitution pattern on the substrate, internal electrophile, ring size, etc.) and operational variables (organolithium reagent, solvent, temperature, time, etc.), so predicting the effect of changes on substrate design (internal elelctrophile, halide, etc.) or reaction conditions on the yield is an important task that could help to optimize the reaction design. The PTML model developed uses PT operators to account for perturbations under experimental conditions and/or structural variables of all the molecules involved in a query reaction, compared to a reaction of reference. Thus, a dataset of &gt;100 reactions has been collected for different substrates and internal electrophiles, under different reaction conditions, with a wide range of yields (0–98%). The best PTML model found using General Linear Regression (GLR) has R = 0.88 in training and R = 0.83 in external validation series for 10 000 pairs of query and reference reactions. The PTML model has a final R = 0.95 for all reactions using multiple reactions of reference. We also report a comparative study of linear versus nonlinear PTML models based on artificial neural network (ANN) algorithms. PTML-ANN models (LNN, MLP, RBF) with R ≈ 0.1–0.8 do not outperform the first PMTL model. This result confirms the validity of the linearity of the model. Next, we carried out an experimental and theoretical study of nonreported Parham reactions to illustrate the practical use of the PTML model. A 500 000-point simulation and a Hammett analysis of the reactivity space of Parham reactions are also reported.</abstract><cop>United States</cop><pub>American Chemical Society</pub><pmid>29898360</pmid><doi>10.1021/acs.jcim.8b00286</doi><tpages>13</tpages><orcidid>https://orcid.org/0000-0001-8624-6842</orcidid><orcidid>https://orcid.org/0000-0002-9392-2797</orcidid><orcidid>https://orcid.org/0000-0003-3079-6380</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1549-9596
ispartof Journal of chemical information and modeling, 2018-07, Vol.58 (7), p.1384-1396
issn 1549-9596
1549-960X
language eng
recordid cdi_proquest_miscellaneous_2055614082
source American Chemical Society Publications
subjects Algorithms
Artificial intelligence
Artificial neural networks
Chemical synthesis
Comparative studies
Computer simulation
Design optimization
Learning theory
Linearity
Machine learning
Mathematical models
Organic chemistry
Perturbation methods
Perturbation theory
Reagents
Regression analysis
Substitution reactions
Substrates
Theory
title Perturbation-Theory and Machine Learning (PTML) Model for High-Throughput Screening of Parham Reactions: Experimental and Theoretical Studies
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T14%3A28%3A17IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Perturbation-Theory%20and%20Machine%20Learning%20(PTML)%20Model%20for%20High-Throughput%20Screening%20of%20Parham%20Reactions:%20Experimental%20and%20Theoretical%20Studies&rft.jtitle=Journal%20of%20chemical%20information%20and%20modeling&rft.au=Simo%CC%81n-Vidal,%20Lorena&rft.date=2018-07-23&rft.volume=58&rft.issue=7&rft.spage=1384&rft.epage=1396&rft.pages=1384-1396&rft.issn=1549-9596&rft.eissn=1549-960X&rft_id=info:doi/10.1021/acs.jcim.8b00286&rft_dat=%3Cproquest_cross%3E2055614082%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2120176651&rft_id=info:pmid/29898360&rfr_iscdi=true