A quantitative uncertainty metric controls error in neural network-driven chemical discovery
Machine learning (ML) models, such as artificial neural networks, have emerged as a complement to high-throughput screening, enabling characterization of new compounds in seconds instead of hours. The promise of ML models to enable large-scale chemical space exploration can only be realized if it is...
Gespeichert in:
Veröffentlicht in: | Chemical science (Cambridge) 2019-09, Vol.1 (34), p.7913-7922 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 7922 |
---|---|
container_issue | 34 |
container_start_page | 7913 |
container_title | Chemical science (Cambridge) |
container_volume | 1 |
creator | Janet, Jon Paul Duan, Chenru Yang, Tzuhsiung Nandy, Aditya Kulik, Heather J |
description | Machine learning (ML) models, such as artificial neural networks, have emerged as a complement to high-throughput screening, enabling characterization of new compounds in seconds instead of hours. The promise of ML models to enable large-scale chemical space exploration can only be realized if it is straightforward to identify when molecules and materials are outside the model's domain of applicability. Established uncertainty metrics for neural network models are either costly to obtain (
e.g.
, ensemble models) or rely on feature engineering (
e.g.
, feature space distances), and each has limitations in estimating prediction errors for chemical space exploration. We introduce the distance to available data in the latent space of a neural network ML model as a low-cost, quantitative uncertainty metric that works for both inorganic and organic chemistry. The calibrated performance of this approach exceeds widely used uncertainty metrics and is readily applied to models of increasing complexity at no additional cost. Tightening latent distance cutoffs systematically drives down predicted model errors below training errors, thus enabling predictive error control in chemical discovery or identification of useful data points for active learning.
A predictive approach for driving down machine learning model errors is introduced and demonstrated across discovery for inorganic and organic chemistry. |
doi_str_mv | 10.1039/c9sc02298h |
format | Article |
fullrecord | <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_proquest_journals_2281131201</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2281131201</sourcerecordid><originalsourceid>FETCH-LOGICAL-c535t-9642d309b8a8ba1a8843bf24928cecf810c410f4f8bc39bf9ae07d2e485dbc723</originalsourceid><addsrcrecordid>eNpdkc1rFTEUxYNY2tJ2070y4EaEqfmamWQjlIfaQqELdSeEzJ2ML3UmaW8yT95_b_TVZzWbE3J_OZzLIeSc0QtGhX4LOgHlXKv1M3LMqWR12wj9fH_n9IicpXRHyxGCNbw7JEdFlRJCHpOvl9XDYkP22Wa_cdUSwGG2PuRtNbuMHiqIIWOcUuUQI1Y-VMEtaKci-UfE7_WA5WeoYO1mD-V98AnixuH2lByMdkru7FFPyJcP7z-vruqb24_Xq8ubGhrR5Fq3kg-C6l5Z1VtmlZKiH7nUXIGDUTEKktFRjqoHoftRW0e7gTupmqGHjosT8m7ne7_0sxvAlcB2MvfoZ4tbE603_06CX5tvcWParpWyo8Xg9aMBxofFpWzmsoObJhtcXJLhgjLV6Y62BX31H3oXFwxlPcO5YkwwTlmh3uwowJgSunEfhlHzqzez0p9Wv3u7KvDLp_H36J-WCvBiB2CC_fRv8eIn0O6fHA</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2281131201</pqid></control><display><type>article</type><title>A quantitative uncertainty metric controls error in neural network-driven chemical discovery</title><source>DOAJ Directory of Open Access Journals</source><source>EZB-FREE-00999 freely available EZB journals</source><source>PubMed Central</source><source>PubMed Central Open Access</source><creator>Janet, Jon Paul ; Duan, Chenru ; Yang, Tzuhsiung ; Nandy, Aditya ; Kulik, Heather J</creator><creatorcontrib>Janet, Jon Paul ; Duan, Chenru ; Yang, Tzuhsiung ; Nandy, Aditya ; Kulik, Heather J</creatorcontrib><description>Machine learning (ML) models, such as artificial neural networks, have emerged as a complement to high-throughput screening, enabling characterization of new compounds in seconds instead of hours. The promise of ML models to enable large-scale chemical space exploration can only be realized if it is straightforward to identify when molecules and materials are outside the model's domain of applicability. Established uncertainty metrics for neural network models are either costly to obtain (
e.g.
, ensemble models) or rely on feature engineering (
e.g.
, feature space distances), and each has limitations in estimating prediction errors for chemical space exploration. We introduce the distance to available data in the latent space of a neural network ML model as a low-cost, quantitative uncertainty metric that works for both inorganic and organic chemistry. The calibrated performance of this approach exceeds widely used uncertainty metrics and is readily applied to models of increasing complexity at no additional cost. Tightening latent distance cutoffs systematically drives down predicted model errors below training errors, thus enabling predictive error control in chemical discovery or identification of useful data points for active learning.
A predictive approach for driving down machine learning model errors is introduced and demonstrated across discovery for inorganic and organic chemistry.</description><identifier>ISSN: 2041-6520</identifier><identifier>EISSN: 2041-6539</identifier><identifier>DOI: 10.1039/c9sc02298h</identifier><identifier>PMID: 31588334</identifier><language>eng</language><publisher>England: Royal Society of Chemistry</publisher><subject>Active learning ; Artificial neural networks ; Calibration ; Chemistry ; Data points ; Errors ; Learning theory ; Machine learning ; Mathematical models ; Neural networks ; Organic chemistry ; Parameter uncertainty ; Predictive control ; Space exploration ; Training</subject><ispartof>Chemical science (Cambridge), 2019-09, Vol.1 (34), p.7913-7922</ispartof><rights>This journal is © The Royal Society of Chemistry 2019.</rights><rights>Copyright Royal Society of Chemistry 2019</rights><rights>This journal is © The Royal Society of Chemistry 2019 2019</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c535t-9642d309b8a8ba1a8843bf24928cecf810c410f4f8bc39bf9ae07d2e485dbc723</citedby><cites>FETCH-LOGICAL-c535t-9642d309b8a8ba1a8843bf24928cecf810c410f4f8bc39bf9ae07d2e485dbc723</cites><orcidid>0000-0003-2592-4237 ; 0000-0001-9342-0191 ; 0000-0001-7825-4797 ; 0000-0001-7137-5449</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC6764470/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC6764470/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,723,776,780,860,881,27901,27902,53766,53768</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/31588334$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Janet, Jon Paul</creatorcontrib><creatorcontrib>Duan, Chenru</creatorcontrib><creatorcontrib>Yang, Tzuhsiung</creatorcontrib><creatorcontrib>Nandy, Aditya</creatorcontrib><creatorcontrib>Kulik, Heather J</creatorcontrib><title>A quantitative uncertainty metric controls error in neural network-driven chemical discovery</title><title>Chemical science (Cambridge)</title><addtitle>Chem Sci</addtitle><description>Machine learning (ML) models, such as artificial neural networks, have emerged as a complement to high-throughput screening, enabling characterization of new compounds in seconds instead of hours. The promise of ML models to enable large-scale chemical space exploration can only be realized if it is straightforward to identify when molecules and materials are outside the model's domain of applicability. Established uncertainty metrics for neural network models are either costly to obtain (
e.g.
, ensemble models) or rely on feature engineering (
e.g.
, feature space distances), and each has limitations in estimating prediction errors for chemical space exploration. We introduce the distance to available data in the latent space of a neural network ML model as a low-cost, quantitative uncertainty metric that works for both inorganic and organic chemistry. The calibrated performance of this approach exceeds widely used uncertainty metrics and is readily applied to models of increasing complexity at no additional cost. Tightening latent distance cutoffs systematically drives down predicted model errors below training errors, thus enabling predictive error control in chemical discovery or identification of useful data points for active learning.
A predictive approach for driving down machine learning model errors is introduced and demonstrated across discovery for inorganic and organic chemistry.</description><subject>Active learning</subject><subject>Artificial neural networks</subject><subject>Calibration</subject><subject>Chemistry</subject><subject>Data points</subject><subject>Errors</subject><subject>Learning theory</subject><subject>Machine learning</subject><subject>Mathematical models</subject><subject>Neural networks</subject><subject>Organic chemistry</subject><subject>Parameter uncertainty</subject><subject>Predictive control</subject><subject>Space exploration</subject><subject>Training</subject><issn>2041-6520</issn><issn>2041-6539</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><recordid>eNpdkc1rFTEUxYNY2tJ2070y4EaEqfmamWQjlIfaQqELdSeEzJ2ML3UmaW8yT95_b_TVZzWbE3J_OZzLIeSc0QtGhX4LOgHlXKv1M3LMqWR12wj9fH_n9IicpXRHyxGCNbw7JEdFlRJCHpOvl9XDYkP22Wa_cdUSwGG2PuRtNbuMHiqIIWOcUuUQI1Y-VMEtaKci-UfE7_WA5WeoYO1mD-V98AnixuH2lByMdkru7FFPyJcP7z-vruqb24_Xq8ubGhrR5Fq3kg-C6l5Z1VtmlZKiH7nUXIGDUTEKktFRjqoHoftRW0e7gTupmqGHjosT8m7ne7_0sxvAlcB2MvfoZ4tbE603_06CX5tvcWParpWyo8Xg9aMBxofFpWzmsoObJhtcXJLhgjLV6Y62BX31H3oXFwxlPcO5YkwwTlmh3uwowJgSunEfhlHzqzez0p9Wv3u7KvDLp_H36J-WCvBiB2CC_fRv8eIn0O6fHA</recordid><startdate>20190914</startdate><enddate>20190914</enddate><creator>Janet, Jon Paul</creator><creator>Duan, Chenru</creator><creator>Yang, Tzuhsiung</creator><creator>Nandy, Aditya</creator><creator>Kulik, Heather J</creator><general>Royal Society of Chemistry</general><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>7X8</scope><scope>5PM</scope><orcidid>https://orcid.org/0000-0003-2592-4237</orcidid><orcidid>https://orcid.org/0000-0001-9342-0191</orcidid><orcidid>https://orcid.org/0000-0001-7825-4797</orcidid><orcidid>https://orcid.org/0000-0001-7137-5449</orcidid></search><sort><creationdate>20190914</creationdate><title>A quantitative uncertainty metric controls error in neural network-driven chemical discovery</title><author>Janet, Jon Paul ; Duan, Chenru ; Yang, Tzuhsiung ; Nandy, Aditya ; Kulik, Heather J</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c535t-9642d309b8a8ba1a8843bf24928cecf810c410f4f8bc39bf9ae07d2e485dbc723</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Active learning</topic><topic>Artificial neural networks</topic><topic>Calibration</topic><topic>Chemistry</topic><topic>Data points</topic><topic>Errors</topic><topic>Learning theory</topic><topic>Machine learning</topic><topic>Mathematical models</topic><topic>Neural networks</topic><topic>Organic chemistry</topic><topic>Parameter uncertainty</topic><topic>Predictive control</topic><topic>Space exploration</topic><topic>Training</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Janet, Jon Paul</creatorcontrib><creatorcontrib>Duan, Chenru</creatorcontrib><creatorcontrib>Yang, Tzuhsiung</creatorcontrib><creatorcontrib>Nandy, Aditya</creatorcontrib><creatorcontrib>Kulik, Heather J</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Chemical science (Cambridge)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Janet, Jon Paul</au><au>Duan, Chenru</au><au>Yang, Tzuhsiung</au><au>Nandy, Aditya</au><au>Kulik, Heather J</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A quantitative uncertainty metric controls error in neural network-driven chemical discovery</atitle><jtitle>Chemical science (Cambridge)</jtitle><addtitle>Chem Sci</addtitle><date>2019-09-14</date><risdate>2019</risdate><volume>1</volume><issue>34</issue><spage>7913</spage><epage>7922</epage><pages>7913-7922</pages><issn>2041-6520</issn><eissn>2041-6539</eissn><abstract>Machine learning (ML) models, such as artificial neural networks, have emerged as a complement to high-throughput screening, enabling characterization of new compounds in seconds instead of hours. The promise of ML models to enable large-scale chemical space exploration can only be realized if it is straightforward to identify when molecules and materials are outside the model's domain of applicability. Established uncertainty metrics for neural network models are either costly to obtain (
e.g.
, ensemble models) or rely on feature engineering (
e.g.
, feature space distances), and each has limitations in estimating prediction errors for chemical space exploration. We introduce the distance to available data in the latent space of a neural network ML model as a low-cost, quantitative uncertainty metric that works for both inorganic and organic chemistry. The calibrated performance of this approach exceeds widely used uncertainty metrics and is readily applied to models of increasing complexity at no additional cost. Tightening latent distance cutoffs systematically drives down predicted model errors below training errors, thus enabling predictive error control in chemical discovery or identification of useful data points for active learning.
A predictive approach for driving down machine learning model errors is introduced and demonstrated across discovery for inorganic and organic chemistry.</abstract><cop>England</cop><pub>Royal Society of Chemistry</pub><pmid>31588334</pmid><doi>10.1039/c9sc02298h</doi><tpages>1</tpages><orcidid>https://orcid.org/0000-0003-2592-4237</orcidid><orcidid>https://orcid.org/0000-0001-9342-0191</orcidid><orcidid>https://orcid.org/0000-0001-7825-4797</orcidid><orcidid>https://orcid.org/0000-0001-7137-5449</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 2041-6520 |
ispartof | Chemical science (Cambridge), 2019-09, Vol.1 (34), p.7913-7922 |
issn | 2041-6520 2041-6539 |
language | eng |
recordid | cdi_proquest_journals_2281131201 |
source | DOAJ Directory of Open Access Journals; EZB-FREE-00999 freely available EZB journals; PubMed Central; PubMed Central Open Access |
subjects | Active learning Artificial neural networks Calibration Chemistry Data points Errors Learning theory Machine learning Mathematical models Neural networks Organic chemistry Parameter uncertainty Predictive control Space exploration Training |
title | A quantitative uncertainty metric controls error in neural network-driven chemical discovery |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-08T13%3A47%3A52IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20quantitative%20uncertainty%20metric%20controls%20error%20in%20neural%20network-driven%20chemical%20discovery&rft.jtitle=Chemical%20science%20(Cambridge)&rft.au=Janet,%20Jon%20Paul&rft.date=2019-09-14&rft.volume=1&rft.issue=34&rft.spage=7913&rft.epage=7922&rft.pages=7913-7922&rft.issn=2041-6520&rft.eissn=2041-6539&rft_id=info:doi/10.1039/c9sc02298h&rft_dat=%3Cproquest_pubme%3E2281131201%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2281131201&rft_id=info:pmid/31588334&rfr_iscdi=true |