Self-Evolving Machine: A Continuously Improving Model for Molecular Thermochemistry
Because collecting precise and accurate chemistry data is often challenging, chemistry data sets usually only span a small region of chemical space, which limits the performance and the scope of applicability of data-driven models. To address this issue, we integrated an active learning machine with...
Gespeichert in:
Veröffentlicht in: | The journal of physical chemistry. A, Molecules, spectroscopy, kinetics, environment, & general theory Molecules, spectroscopy, kinetics, environment, & general theory, 2019-03, Vol.123 (10), p.2142-2152 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 2152 |
---|---|
container_issue | 10 |
container_start_page | 2142 |
container_title | The journal of physical chemistry. A, Molecules, spectroscopy, kinetics, environment, & general theory |
container_volume | 123 |
creator | Li, Yi-Pei Han, Kehang Grambow, Colin A Green, William H |
description | Because collecting precise and accurate chemistry data is often challenging, chemistry data sets usually only span a small region of chemical space, which limits the performance and the scope of applicability of data-driven models. To address this issue, we integrated an active learning machine with automatic ab initio calculations to form a self-evolving model that can continuously adapt to new species appointed by the users. In the present work, we demonstrate the self-evolving concept by modeling the formation enthalpies of stable closed-shell polycyclic species calculated at the B3LYP/6-31G(2df,p) level of theory. By combining a molecular graph convolutional neural network with a dropout training strategy, the model we developed can predict density functional theory (DFT) enthalpies for a broad range of polycyclic species and assess the quality of each predicted value. For the species which the current model is uncertain about, the automatic ab initio calculations provide additional training data to improve the performance of the model. For a test set composed of 2858 cyclic and polycyclic hydrocarbons and oxygenates, the enthalpies predicted by the model agree with the reference DFT values with a root-mean-square error of 2.62 kcal/mol. We found that a model originally trained on hydrocarbons and oxygenates can broaden its prediction coverage to nitrogen-containing species via an active learning process, suggesting that the continuous learning strategy is not only able to improve the model accuracy but is also capable of expanding the predictive capacity of a model to unseen species domains. |
doi_str_mv | 10.1021/acs.jpca.8b10789 |
format | Article |
fullrecord | <record><control><sourceid>proquest_osti_</sourceid><recordid>TN_cdi_osti_scitechconnect_1530407</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2187533464</sourcerecordid><originalsourceid>FETCH-LOGICAL-a405t-c4105a3b0528131c6c164a67327b1a2c04571464d4bfddecf41a40bf6616b17f3</originalsourceid><addsrcrecordid>eNp1kD1PwzAQhi0EonztTChiYiDlLraTlK2qClQqYgBmy3EcmsqJi51U6r_HkMLG5Bue9z3fQ8glwhghwTup_Hi9UXKcFwhZPjkgJ8gTiHmC_DDMkE9intLJiJx6vwYApAk7JiMKGc8nnJ6Q11dtqni-tWZbtx_Rs1SrutX30TSa2bar29723uyiRbNxdiBsqU1UWRcmo1VvpIveVto1Vq10U_vO7c7JUSWN1xf794y8P8zfZk_x8uVxMZsuY8mAd7FiCFzSAniSI0WVKkyZTDOaZAXKRAHjGbKUlayoylKrimEIFlWaYlpgVtEzcj30Wt_Vwqu602qlbNtq1QnkFBhkAboZoHDAZ699J8InlTZGtjrcJhLMM05p2BNQGFDlrPdOV2Lj6ka6nUAQ375F8C2-fYu97xC52rf3RaPLv8Cv4ADcDsBP1PauDUb-7_sCTKSLOg</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2187533464</pqid></control><display><type>article</type><title>Self-Evolving Machine: A Continuously Improving Model for Molecular Thermochemistry</title><source>ACS Publications</source><creator>Li, Yi-Pei ; Han, Kehang ; Grambow, Colin A ; Green, William H</creator><creatorcontrib>Li, Yi-Pei ; Han, Kehang ; Grambow, Colin A ; Green, William H ; Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States). National Energy Research Scientific Computing Center (NERSC)</creatorcontrib><description>Because collecting precise and accurate chemistry data is often challenging, chemistry data sets usually only span a small region of chemical space, which limits the performance and the scope of applicability of data-driven models. To address this issue, we integrated an active learning machine with automatic ab initio calculations to form a self-evolving model that can continuously adapt to new species appointed by the users. In the present work, we demonstrate the self-evolving concept by modeling the formation enthalpies of stable closed-shell polycyclic species calculated at the B3LYP/6-31G(2df,p) level of theory. By combining a molecular graph convolutional neural network with a dropout training strategy, the model we developed can predict density functional theory (DFT) enthalpies for a broad range of polycyclic species and assess the quality of each predicted value. For the species which the current model is uncertain about, the automatic ab initio calculations provide additional training data to improve the performance of the model. For a test set composed of 2858 cyclic and polycyclic hydrocarbons and oxygenates, the enthalpies predicted by the model agree with the reference DFT values with a root-mean-square error of 2.62 kcal/mol. We found that a model originally trained on hydrocarbons and oxygenates can broaden its prediction coverage to nitrogen-containing species via an active learning process, suggesting that the continuous learning strategy is not only able to improve the model accuracy but is also capable of expanding the predictive capacity of a model to unseen species domains.</description><identifier>ISSN: 1089-5639</identifier><identifier>EISSN: 1520-5215</identifier><identifier>DOI: 10.1021/acs.jpca.8b10789</identifier><identifier>PMID: 30758953</identifier><language>eng</language><publisher>United States: American Chemical Society</publisher><subject>INORGANIC, ORGANIC, PHYSICAL, AND ANALYTICAL CHEMISTRY</subject><ispartof>The journal of physical chemistry. A, Molecules, spectroscopy, kinetics, environment, & general theory, 2019-03, Vol.123 (10), p.2142-2152</ispartof><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-a405t-c4105a3b0528131c6c164a67327b1a2c04571464d4bfddecf41a40bf6616b17f3</citedby><cites>FETCH-LOGICAL-a405t-c4105a3b0528131c6c164a67327b1a2c04571464d4bfddecf41a40bf6616b17f3</cites><orcidid>0000-0002-1314-3276 ; 0000-0002-2204-9046 ; 0000-0003-2603-9694 ; 0000-0002-0628-5305 ; 0000000326039694 ; 0000000213143276 ; 0000000222049046 ; 0000000206285305</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://pubs.acs.org/doi/pdf/10.1021/acs.jpca.8b10789$$EPDF$$P50$$Gacs$$H</linktopdf><linktohtml>$$Uhttps://pubs.acs.org/doi/10.1021/acs.jpca.8b10789$$EHTML$$P50$$Gacs$$H</linktohtml><link.rule.ids>230,314,780,784,885,2765,27076,27924,27925,56738,56788</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/30758953$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink><backlink>$$Uhttps://www.osti.gov/servlets/purl/1530407$$D View this record in Osti.gov$$Hfree_for_read</backlink></links><search><creatorcontrib>Li, Yi-Pei</creatorcontrib><creatorcontrib>Han, Kehang</creatorcontrib><creatorcontrib>Grambow, Colin A</creatorcontrib><creatorcontrib>Green, William H</creatorcontrib><creatorcontrib>Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States). National Energy Research Scientific Computing Center (NERSC)</creatorcontrib><title>Self-Evolving Machine: A Continuously Improving Model for Molecular Thermochemistry</title><title>The journal of physical chemistry. A, Molecules, spectroscopy, kinetics, environment, & general theory</title><addtitle>J. Phys. Chem. A</addtitle><description>Because collecting precise and accurate chemistry data is often challenging, chemistry data sets usually only span a small region of chemical space, which limits the performance and the scope of applicability of data-driven models. To address this issue, we integrated an active learning machine with automatic ab initio calculations to form a self-evolving model that can continuously adapt to new species appointed by the users. In the present work, we demonstrate the self-evolving concept by modeling the formation enthalpies of stable closed-shell polycyclic species calculated at the B3LYP/6-31G(2df,p) level of theory. By combining a molecular graph convolutional neural network with a dropout training strategy, the model we developed can predict density functional theory (DFT) enthalpies for a broad range of polycyclic species and assess the quality of each predicted value. For the species which the current model is uncertain about, the automatic ab initio calculations provide additional training data to improve the performance of the model. For a test set composed of 2858 cyclic and polycyclic hydrocarbons and oxygenates, the enthalpies predicted by the model agree with the reference DFT values with a root-mean-square error of 2.62 kcal/mol. We found that a model originally trained on hydrocarbons and oxygenates can broaden its prediction coverage to nitrogen-containing species via an active learning process, suggesting that the continuous learning strategy is not only able to improve the model accuracy but is also capable of expanding the predictive capacity of a model to unseen species domains.</description><subject>INORGANIC, ORGANIC, PHYSICAL, AND ANALYTICAL CHEMISTRY</subject><issn>1089-5639</issn><issn>1520-5215</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><recordid>eNp1kD1PwzAQhi0EonztTChiYiDlLraTlK2qClQqYgBmy3EcmsqJi51U6r_HkMLG5Bue9z3fQ8glwhghwTup_Hi9UXKcFwhZPjkgJ8gTiHmC_DDMkE9intLJiJx6vwYApAk7JiMKGc8nnJ6Q11dtqni-tWZbtx_Rs1SrutX30TSa2bar29723uyiRbNxdiBsqU1UWRcmo1VvpIveVto1Vq10U_vO7c7JUSWN1xf794y8P8zfZk_x8uVxMZsuY8mAd7FiCFzSAniSI0WVKkyZTDOaZAXKRAHjGbKUlayoylKrimEIFlWaYlpgVtEzcj30Wt_Vwqu602qlbNtq1QnkFBhkAboZoHDAZ699J8InlTZGtjrcJhLMM05p2BNQGFDlrPdOV2Lj6ka6nUAQ375F8C2-fYu97xC52rf3RaPLv8Cv4ADcDsBP1PauDUb-7_sCTKSLOg</recordid><startdate>20190314</startdate><enddate>20190314</enddate><creator>Li, Yi-Pei</creator><creator>Han, Kehang</creator><creator>Grambow, Colin A</creator><creator>Green, William H</creator><general>American Chemical Society</general><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>OIOZB</scope><scope>OTOTI</scope><orcidid>https://orcid.org/0000-0002-1314-3276</orcidid><orcidid>https://orcid.org/0000-0002-2204-9046</orcidid><orcidid>https://orcid.org/0000-0003-2603-9694</orcidid><orcidid>https://orcid.org/0000-0002-0628-5305</orcidid><orcidid>https://orcid.org/0000000326039694</orcidid><orcidid>https://orcid.org/0000000213143276</orcidid><orcidid>https://orcid.org/0000000222049046</orcidid><orcidid>https://orcid.org/0000000206285305</orcidid></search><sort><creationdate>20190314</creationdate><title>Self-Evolving Machine: A Continuously Improving Model for Molecular Thermochemistry</title><author>Li, Yi-Pei ; Han, Kehang ; Grambow, Colin A ; Green, William H</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a405t-c4105a3b0528131c6c164a67327b1a2c04571464d4bfddecf41a40bf6616b17f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>INORGANIC, ORGANIC, PHYSICAL, AND ANALYTICAL CHEMISTRY</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Li, Yi-Pei</creatorcontrib><creatorcontrib>Han, Kehang</creatorcontrib><creatorcontrib>Grambow, Colin A</creatorcontrib><creatorcontrib>Green, William H</creatorcontrib><creatorcontrib>Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States). National Energy Research Scientific Computing Center (NERSC)</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>OSTI.GOV - Hybrid</collection><collection>OSTI.GOV</collection><jtitle>The journal of physical chemistry. A, Molecules, spectroscopy, kinetics, environment, & general theory</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Li, Yi-Pei</au><au>Han, Kehang</au><au>Grambow, Colin A</au><au>Green, William H</au><aucorp>Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States). National Energy Research Scientific Computing Center (NERSC)</aucorp><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Self-Evolving Machine: A Continuously Improving Model for Molecular Thermochemistry</atitle><jtitle>The journal of physical chemistry. A, Molecules, spectroscopy, kinetics, environment, & general theory</jtitle><addtitle>J. Phys. Chem. A</addtitle><date>2019-03-14</date><risdate>2019</risdate><volume>123</volume><issue>10</issue><spage>2142</spage><epage>2152</epage><pages>2142-2152</pages><issn>1089-5639</issn><eissn>1520-5215</eissn><abstract>Because collecting precise and accurate chemistry data is often challenging, chemistry data sets usually only span a small region of chemical space, which limits the performance and the scope of applicability of data-driven models. To address this issue, we integrated an active learning machine with automatic ab initio calculations to form a self-evolving model that can continuously adapt to new species appointed by the users. In the present work, we demonstrate the self-evolving concept by modeling the formation enthalpies of stable closed-shell polycyclic species calculated at the B3LYP/6-31G(2df,p) level of theory. By combining a molecular graph convolutional neural network with a dropout training strategy, the model we developed can predict density functional theory (DFT) enthalpies for a broad range of polycyclic species and assess the quality of each predicted value. For the species which the current model is uncertain about, the automatic ab initio calculations provide additional training data to improve the performance of the model. For a test set composed of 2858 cyclic and polycyclic hydrocarbons and oxygenates, the enthalpies predicted by the model agree with the reference DFT values with a root-mean-square error of 2.62 kcal/mol. We found that a model originally trained on hydrocarbons and oxygenates can broaden its prediction coverage to nitrogen-containing species via an active learning process, suggesting that the continuous learning strategy is not only able to improve the model accuracy but is also capable of expanding the predictive capacity of a model to unseen species domains.</abstract><cop>United States</cop><pub>American Chemical Society</pub><pmid>30758953</pmid><doi>10.1021/acs.jpca.8b10789</doi><tpages>11</tpages><orcidid>https://orcid.org/0000-0002-1314-3276</orcidid><orcidid>https://orcid.org/0000-0002-2204-9046</orcidid><orcidid>https://orcid.org/0000-0003-2603-9694</orcidid><orcidid>https://orcid.org/0000-0002-0628-5305</orcidid><orcidid>https://orcid.org/0000000326039694</orcidid><orcidid>https://orcid.org/0000000213143276</orcidid><orcidid>https://orcid.org/0000000222049046</orcidid><orcidid>https://orcid.org/0000000206285305</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1089-5639 |
ispartof | The journal of physical chemistry. A, Molecules, spectroscopy, kinetics, environment, & general theory, 2019-03, Vol.123 (10), p.2142-2152 |
issn | 1089-5639 1520-5215 |
language | eng |
recordid | cdi_osti_scitechconnect_1530407 |
source | ACS Publications |
subjects | INORGANIC, ORGANIC, PHYSICAL, AND ANALYTICAL CHEMISTRY |
title | Self-Evolving Machine: A Continuously Improving Model for Molecular Thermochemistry |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-06T04%3A37%3A39IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_osti_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Self-Evolving%20Machine:%20A%20Continuously%20Improving%20Model%20for%20Molecular%20Thermochemistry&rft.jtitle=The%20journal%20of%20physical%20chemistry.%20A,%20Molecules,%20spectroscopy,%20kinetics,%20environment,%20&%20general%20theory&rft.au=Li,%20Yi-Pei&rft.aucorp=Lawrence%20Berkeley%20National%20Laboratory%20(LBNL),%20Berkeley,%20CA%20(United%20States).%20National%20Energy%20Research%20Scientific%20Computing%20Center%20(NERSC)&rft.date=2019-03-14&rft.volume=123&rft.issue=10&rft.spage=2142&rft.epage=2152&rft.pages=2142-2152&rft.issn=1089-5639&rft.eissn=1520-5215&rft_id=info:doi/10.1021/acs.jpca.8b10789&rft_dat=%3Cproquest_osti_%3E2187533464%3C/proquest_osti_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2187533464&rft_id=info:pmid/30758953&rfr_iscdi=true |