Machine learning meets mechanistic modelling for accurate prediction of experimental activation energies

Accurate prediction of chemical reactions in solution is challenging for current state-of-the-art approaches based on transition state modelling with density functional theory. Models based on machine learning have emerged as a promising alternative to address these problems, but these models curren...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Chemical science (Cambridge) 2021-01, Vol.12 (3), p.1163-1175
Hauptverfasser: Jorner, Kjell, Brinck, Tore, Norrby, Per-Ola, Buttar, David
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1175
container_issue 3
container_start_page 1163
container_title Chemical science (Cambridge)
container_volume 12
creator Jorner, Kjell
Brinck, Tore
Norrby, Per-Ola
Buttar, David
description Accurate prediction of chemical reactions in solution is challenging for current state-of-the-art approaches based on transition state modelling with density functional theory. Models based on machine learning have emerged as a promising alternative to address these problems, but these models currently lack the precision to give crucial information on the magnitude of barrier heights, influence of solvents and catalysts and extent of regio- and chemoselectivity. Here, we construct hybrid models which combine the traditional transition state modelling and machine learning to accurately predict reaction barriers. We train a Gaussian Process Regression model to reproduce high-quality experimental kinetic data for the nucleophilic aromatic substitution reaction and use it to predict barriers with a mean absolute error of 0.77 kcal mol −1 for an external test set. The model was further validated on regio- and chemoselectivity prediction on patent reaction data and achieved a competitive top-1 accuracy of 86%, despite not being trained explicitly for this task. Importantly, the model gives error bars for its predictions that can be used for risk assessment by the end user. Hybrid models emerge as the preferred alternative for accurate reaction prediction in the very common low-data situation where only 100-150 rate constants are available for a reaction class. With recent advances in deep learning for quickly predicting barriers and transition state geometries from density functional theory, we envision that hybrid models will soon become a standard alternative to complement current machine learning approaches based on ground-state physical organic descriptors or structural information such as molecular graphs or fingerprints. Hybrid reactivity models, combining mechanistic calculations and machine learning with descriptors, are used to predict barriers for nucleophilic aromatic substitution.
doi_str_mv 10.1039/d0sc04896h
format Article
fullrecord <record><control><sourceid>proquest_swepu</sourceid><recordid>TN_cdi_proquest_miscellaneous_2729530123</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2729530123</sourcerecordid><originalsourceid>FETCH-LOGICAL-c507t-7b8d117ecbd33545ab3ffe998ac41f87d4f6511cd52e64dcf9546fe9bf4d87c3</originalsourceid><addsrcrecordid>eNpdks9rFDEUx4Motqy9eFcGvIgwmt8zuQhlq1aoeLB4DZnkZSd1NlmTmar_vWm3rtYcksD38x5f3vch9JTg1wQz9cbhYjHvlRwfoGOKOWmlYOrh4U_xETop5QrXwxgRtHuMjpikSslOHqPxk7FjiNBMYHIMcdNsAeZSbzuaGMocbLNNDqbpRvMpN8baJZsZml0GF-wcUmySb-DnDnLYQpzNVJk5XJtbCSLkTYDyBD3yZipwcveu0OX7d5fr8_bi84eP69OL1grczW039I6QDuzgGBNcmIF5D0r1xnLi-85xLwUh1gkKkjvrleCyAoPnru8sW6F237b8gN0y6F31ZPIvnUzQZ-HrqU55o7_No6aKKMkr_3bPV3gLzlb_2Uz3yu4rMYx6k661ErTvawIr9PKuQU7fFyiz3oZi67xMhLQUTTuqBMOEsoq--A-9SkuOdRqa8p5ITBnuKvVqT9mcSsngD2YI1jeR6zP8ZX0b-XmFn_9r_4D-CbgCz_ZALvag_t0Z9htijrPt</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2481602307</pqid></control><display><type>article</type><title>Machine learning meets mechanistic modelling for accurate prediction of experimental activation energies</title><source>DOAJ Directory of Open Access Journals</source><source>SWEPUB Freely available online</source><source>PubMed Central Open Access</source><source>EZB-FREE-00999 freely available EZB journals</source><source>PubMed Central</source><creator>Jorner, Kjell ; Brinck, Tore ; Norrby, Per-Ola ; Buttar, David</creator><creatorcontrib>Jorner, Kjell ; Brinck, Tore ; Norrby, Per-Ola ; Buttar, David</creatorcontrib><description>Accurate prediction of chemical reactions in solution is challenging for current state-of-the-art approaches based on transition state modelling with density functional theory. Models based on machine learning have emerged as a promising alternative to address these problems, but these models currently lack the precision to give crucial information on the magnitude of barrier heights, influence of solvents and catalysts and extent of regio- and chemoselectivity. Here, we construct hybrid models which combine the traditional transition state modelling and machine learning to accurately predict reaction barriers. We train a Gaussian Process Regression model to reproduce high-quality experimental kinetic data for the nucleophilic aromatic substitution reaction and use it to predict barriers with a mean absolute error of 0.77 kcal mol −1 for an external test set. The model was further validated on regio- and chemoselectivity prediction on patent reaction data and achieved a competitive top-1 accuracy of 86%, despite not being trained explicitly for this task. Importantly, the model gives error bars for its predictions that can be used for risk assessment by the end user. Hybrid models emerge as the preferred alternative for accurate reaction prediction in the very common low-data situation where only 100-150 rate constants are available for a reaction class. With recent advances in deep learning for quickly predicting barriers and transition state geometries from density functional theory, we envision that hybrid models will soon become a standard alternative to complement current machine learning approaches based on ground-state physical organic descriptors or structural information such as molecular graphs or fingerprints. Hybrid reactivity models, combining mechanistic calculations and machine learning with descriptors, are used to predict barriers for nucleophilic aromatic substitution.</description><identifier>ISSN: 2041-6520</identifier><identifier>ISSN: 2041-6539</identifier><identifier>EISSN: 2041-6539</identifier><identifier>DOI: 10.1039/d0sc04896h</identifier><identifier>PMID: 36299676</identifier><language>eng</language><publisher>England: Royal Society of Chemistry</publisher><subject>Chemical reactions ; Chemistry ; Density functional theory ; Gaussian process ; Machine learning ; Model testing ; Rate constants ; Regression models ; Risk assessment ; Substitution reactions ; Workflow</subject><ispartof>Chemical science (Cambridge), 2021-01, Vol.12 (3), p.1163-1175</ispartof><rights>This journal is © The Royal Society of Chemistry.</rights><rights>Copyright Royal Society of Chemistry 2021</rights><rights>This journal is © The Royal Society of Chemistry 2021 The Royal Society of Chemistry</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c507t-7b8d117ecbd33545ab3ffe998ac41f87d4f6511cd52e64dcf9546fe9bf4d87c3</citedby><cites>FETCH-LOGICAL-c507t-7b8d117ecbd33545ab3ffe998ac41f87d4f6511cd52e64dcf9546fe9bf4d87c3</cites><orcidid>0000-0002-2419-0705 ; 0000-0003-2673-075X ; 0000-0001-5466-023X ; 0000-0002-4191-6790</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC9528810/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC9528810/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,552,727,780,784,864,885,27924,27925,53791,53793</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/36299676$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink><backlink>$$Uhttps://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-291964$$DView record from Swedish Publication Index$$Hfree_for_read</backlink></links><search><creatorcontrib>Jorner, Kjell</creatorcontrib><creatorcontrib>Brinck, Tore</creatorcontrib><creatorcontrib>Norrby, Per-Ola</creatorcontrib><creatorcontrib>Buttar, David</creatorcontrib><title>Machine learning meets mechanistic modelling for accurate prediction of experimental activation energies</title><title>Chemical science (Cambridge)</title><addtitle>Chem Sci</addtitle><description>Accurate prediction of chemical reactions in solution is challenging for current state-of-the-art approaches based on transition state modelling with density functional theory. Models based on machine learning have emerged as a promising alternative to address these problems, but these models currently lack the precision to give crucial information on the magnitude of barrier heights, influence of solvents and catalysts and extent of regio- and chemoselectivity. Here, we construct hybrid models which combine the traditional transition state modelling and machine learning to accurately predict reaction barriers. We train a Gaussian Process Regression model to reproduce high-quality experimental kinetic data for the nucleophilic aromatic substitution reaction and use it to predict barriers with a mean absolute error of 0.77 kcal mol −1 for an external test set. The model was further validated on regio- and chemoselectivity prediction on patent reaction data and achieved a competitive top-1 accuracy of 86%, despite not being trained explicitly for this task. Importantly, the model gives error bars for its predictions that can be used for risk assessment by the end user. Hybrid models emerge as the preferred alternative for accurate reaction prediction in the very common low-data situation where only 100-150 rate constants are available for a reaction class. With recent advances in deep learning for quickly predicting barriers and transition state geometries from density functional theory, we envision that hybrid models will soon become a standard alternative to complement current machine learning approaches based on ground-state physical organic descriptors or structural information such as molecular graphs or fingerprints. Hybrid reactivity models, combining mechanistic calculations and machine learning with descriptors, are used to predict barriers for nucleophilic aromatic substitution.</description><subject>Chemical reactions</subject><subject>Chemistry</subject><subject>Density functional theory</subject><subject>Gaussian process</subject><subject>Machine learning</subject><subject>Model testing</subject><subject>Rate constants</subject><subject>Regression models</subject><subject>Risk assessment</subject><subject>Substitution reactions</subject><subject>Workflow</subject><issn>2041-6520</issn><issn>2041-6539</issn><issn>2041-6539</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>D8T</sourceid><recordid>eNpdks9rFDEUx4Motqy9eFcGvIgwmt8zuQhlq1aoeLB4DZnkZSd1NlmTmar_vWm3rtYcksD38x5f3vch9JTg1wQz9cbhYjHvlRwfoGOKOWmlYOrh4U_xETop5QrXwxgRtHuMjpikSslOHqPxk7FjiNBMYHIMcdNsAeZSbzuaGMocbLNNDqbpRvMpN8baJZsZml0GF-wcUmySb-DnDnLYQpzNVJk5XJtbCSLkTYDyBD3yZipwcveu0OX7d5fr8_bi84eP69OL1grczW039I6QDuzgGBNcmIF5D0r1xnLi-85xLwUh1gkKkjvrleCyAoPnru8sW6F237b8gN0y6F31ZPIvnUzQZ-HrqU55o7_No6aKKMkr_3bPV3gLzlb_2Uz3yu4rMYx6k661ErTvawIr9PKuQU7fFyiz3oZi67xMhLQUTTuqBMOEsoq--A-9SkuOdRqa8p5ITBnuKvVqT9mcSsngD2YI1jeR6zP8ZX0b-XmFn_9r_4D-CbgCz_ZALvag_t0Z9htijrPt</recordid><startdate>20210121</startdate><enddate>20210121</enddate><creator>Jorner, Kjell</creator><creator>Brinck, Tore</creator><creator>Norrby, Per-Ola</creator><creator>Buttar, David</creator><general>Royal Society of Chemistry</general><general>The Royal Society of Chemistry</general><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>7X8</scope><scope>5PM</scope><scope>ADTPV</scope><scope>AFDQA</scope><scope>AOWAS</scope><scope>D8T</scope><scope>D8V</scope><scope>ZZAVC</scope><orcidid>https://orcid.org/0000-0002-2419-0705</orcidid><orcidid>https://orcid.org/0000-0003-2673-075X</orcidid><orcidid>https://orcid.org/0000-0001-5466-023X</orcidid><orcidid>https://orcid.org/0000-0002-4191-6790</orcidid></search><sort><creationdate>20210121</creationdate><title>Machine learning meets mechanistic modelling for accurate prediction of experimental activation energies</title><author>Jorner, Kjell ; Brinck, Tore ; Norrby, Per-Ola ; Buttar, David</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c507t-7b8d117ecbd33545ab3ffe998ac41f87d4f6511cd52e64dcf9546fe9bf4d87c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Chemical reactions</topic><topic>Chemistry</topic><topic>Density functional theory</topic><topic>Gaussian process</topic><topic>Machine learning</topic><topic>Model testing</topic><topic>Rate constants</topic><topic>Regression models</topic><topic>Risk assessment</topic><topic>Substitution reactions</topic><topic>Workflow</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Jorner, Kjell</creatorcontrib><creatorcontrib>Brinck, Tore</creatorcontrib><creatorcontrib>Norrby, Per-Ola</creatorcontrib><creatorcontrib>Buttar, David</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><collection>SwePub</collection><collection>SWEPUB Kungliga Tekniska Högskolan full text</collection><collection>SwePub Articles</collection><collection>SWEPUB Freely available online</collection><collection>SWEPUB Kungliga Tekniska Högskolan</collection><collection>SwePub Articles full text</collection><jtitle>Chemical science (Cambridge)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Jorner, Kjell</au><au>Brinck, Tore</au><au>Norrby, Per-Ola</au><au>Buttar, David</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Machine learning meets mechanistic modelling for accurate prediction of experimental activation energies</atitle><jtitle>Chemical science (Cambridge)</jtitle><addtitle>Chem Sci</addtitle><date>2021-01-21</date><risdate>2021</risdate><volume>12</volume><issue>3</issue><spage>1163</spage><epage>1175</epage><pages>1163-1175</pages><issn>2041-6520</issn><issn>2041-6539</issn><eissn>2041-6539</eissn><abstract>Accurate prediction of chemical reactions in solution is challenging for current state-of-the-art approaches based on transition state modelling with density functional theory. Models based on machine learning have emerged as a promising alternative to address these problems, but these models currently lack the precision to give crucial information on the magnitude of barrier heights, influence of solvents and catalysts and extent of regio- and chemoselectivity. Here, we construct hybrid models which combine the traditional transition state modelling and machine learning to accurately predict reaction barriers. We train a Gaussian Process Regression model to reproduce high-quality experimental kinetic data for the nucleophilic aromatic substitution reaction and use it to predict barriers with a mean absolute error of 0.77 kcal mol −1 for an external test set. The model was further validated on regio- and chemoselectivity prediction on patent reaction data and achieved a competitive top-1 accuracy of 86%, despite not being trained explicitly for this task. Importantly, the model gives error bars for its predictions that can be used for risk assessment by the end user. Hybrid models emerge as the preferred alternative for accurate reaction prediction in the very common low-data situation where only 100-150 rate constants are available for a reaction class. With recent advances in deep learning for quickly predicting barriers and transition state geometries from density functional theory, we envision that hybrid models will soon become a standard alternative to complement current machine learning approaches based on ground-state physical organic descriptors or structural information such as molecular graphs or fingerprints. Hybrid reactivity models, combining mechanistic calculations and machine learning with descriptors, are used to predict barriers for nucleophilic aromatic substitution.</abstract><cop>England</cop><pub>Royal Society of Chemistry</pub><pmid>36299676</pmid><doi>10.1039/d0sc04896h</doi><tpages>13</tpages><orcidid>https://orcid.org/0000-0002-2419-0705</orcidid><orcidid>https://orcid.org/0000-0003-2673-075X</orcidid><orcidid>https://orcid.org/0000-0001-5466-023X</orcidid><orcidid>https://orcid.org/0000-0002-4191-6790</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2041-6520
ispartof Chemical science (Cambridge), 2021-01, Vol.12 (3), p.1163-1175
issn 2041-6520
2041-6539
2041-6539
language eng
recordid cdi_proquest_miscellaneous_2729530123
source DOAJ Directory of Open Access Journals; SWEPUB Freely available online; PubMed Central Open Access; EZB-FREE-00999 freely available EZB journals; PubMed Central
subjects Chemical reactions
Chemistry
Density functional theory
Gaussian process
Machine learning
Model testing
Rate constants
Regression models
Risk assessment
Substitution reactions
Workflow
title Machine learning meets mechanistic modelling for accurate prediction of experimental activation energies
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-24T08%3A06%3A46IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_swepu&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Machine%20learning%20meets%20mechanistic%20modelling%20for%20accurate%20prediction%20of%20experimental%20activation%20energies&rft.jtitle=Chemical%20science%20(Cambridge)&rft.au=Jorner,%20Kjell&rft.date=2021-01-21&rft.volume=12&rft.issue=3&rft.spage=1163&rft.epage=1175&rft.pages=1163-1175&rft.issn=2041-6520&rft.eissn=2041-6539&rft_id=info:doi/10.1039/d0sc04896h&rft_dat=%3Cproquest_swepu%3E2729530123%3C/proquest_swepu%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2481602307&rft_id=info:pmid/36299676&rfr_iscdi=true