A framework for automated structure elucidation from routine NMR spectra

Methods to automate structure elucidation that can be applied broadly across chemical structure space have the potential to greatly accelerate chemical discovery. NMR spectroscopy is the most widely used and arguably the most powerful method for elucidating structures of organic molecules. Here we i...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Chemical science (Cambridge) 2021-12, Vol.12 (46), p.15329-15338
Hauptverfasser: Huang, Zhaorui, Chen, Michael S, Woroch, Cristian P, Markland, Thomas E, Kanan, Matthew W
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 15338
container_issue 46
container_start_page 15329
container_title Chemical science (Cambridge)
container_volume 12
creator Huang, Zhaorui
Chen, Michael S
Woroch, Cristian P
Markland, Thomas E
Kanan, Matthew W
description Methods to automate structure elucidation that can be applied broadly across chemical structure space have the potential to greatly accelerate chemical discovery. NMR spectroscopy is the most widely used and arguably the most powerful method for elucidating structures of organic molecules. Here we introduce a machine learning (ML) framework that provides a quantitative probabilistic ranking of the most likely structural connectivity of an unknown compound when given routine, experimental one dimensional 1 H and/or 13 C NMR spectra. In particular, our ML-based algorithm takes input NMR spectra and (i) predicts the presence of specific substructures out of hundreds of substructures it has learned to identify; (ii) annotates the spectrum to label peaks with predicted substructures; and (iii) uses the substructures to construct candidate constitutional isomers and assign to them a probabilistic ranking. Using experimental spectra and molecular formulae for molecules containing up to 10 non-hydrogen atoms, the correct constitutional isomer was the highest-ranking prediction made by our model in 67.4% of the cases and one of the top-ten predictions in 95.8% of the cases. This advance will aid in solving the structure of unknown compounds, and thus further the development of automated structure elucidation tools that could enable the creation of fully autonomous reaction discovery platforms. A machine learning model and graph generator were able to accurately predict for the presence of nearly 1000 substructures and the connectivity of small organic molecules from experimental 1D NMR data.
doi_str_mv 10.1039/d1sc04105c
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_2616285026</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2616285026</sourcerecordid><originalsourceid>FETCH-LOGICAL-c521t-5f9f772807c92649704e952db24405624d6098e746c63f54de37d7b9ea8ca7233</originalsourceid><addsrcrecordid>eNpdks1PFTEUxRuiAYJs2EMmujEmD_rd6caEPEFIABOVddPXuQOFmemzHxr-e4sPHmo3vcn95dx7eorQHsGHBDN91JHkMCdYuA20TWs1k4LpV-ua4i20m9IdrocxIqjaRFuMayWZYNvo7Ljpox3hV4j3TR9iY0sOo83QNSnH4nKJ0MBQnO9s9mGqdBibGEr2EzRXl1-btASXo32DXvd2SLD7dO-g69OT7_Oz2cWXz-fz44uZE5Tkmeh1rxRtsXKayroG5qAF7RaUcywk5Z3EugXFpZOsF7wDpjq10GBbZxVlbAd9XOkuy2KEzsFUhw9mGf1o44MJ1pt_O5O_NTfhp2mrYYpFFXi7Eggpe5Ocz-BuXZimasOQlmHOH6H3T1Ni-FEgZTP65GAY7AShJEMlkbQVmMqKvvsPvQslTvUNKoW5ppiKtlIfVpSLIaUI_Xpjgs1jkOYT-Tb_E-S8wgd_e1yjz7FVYH8FxOTW3ZefwH4DUWigzQ</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2604920258</pqid></control><display><type>article</type><title>A framework for automated structure elucidation from routine NMR spectra</title><source>DOAJ Directory of Open Access Journals</source><source>PubMed Central Open Access</source><source>EZB-FREE-00999 freely available EZB journals</source><source>PubMed Central</source><creator>Huang, Zhaorui ; Chen, Michael S ; Woroch, Cristian P ; Markland, Thomas E ; Kanan, Matthew W</creator><creatorcontrib>Huang, Zhaorui ; Chen, Michael S ; Woroch, Cristian P ; Markland, Thomas E ; Kanan, Matthew W</creatorcontrib><description>Methods to automate structure elucidation that can be applied broadly across chemical structure space have the potential to greatly accelerate chemical discovery. NMR spectroscopy is the most widely used and arguably the most powerful method for elucidating structures of organic molecules. Here we introduce a machine learning (ML) framework that provides a quantitative probabilistic ranking of the most likely structural connectivity of an unknown compound when given routine, experimental one dimensional 1 H and/or 13 C NMR spectra. In particular, our ML-based algorithm takes input NMR spectra and (i) predicts the presence of specific substructures out of hundreds of substructures it has learned to identify; (ii) annotates the spectrum to label peaks with predicted substructures; and (iii) uses the substructures to construct candidate constitutional isomers and assign to them a probabilistic ranking. Using experimental spectra and molecular formulae for molecules containing up to 10 non-hydrogen atoms, the correct constitutional isomer was the highest-ranking prediction made by our model in 67.4% of the cases and one of the top-ten predictions in 95.8% of the cases. This advance will aid in solving the structure of unknown compounds, and thus further the development of automated structure elucidation tools that could enable the creation of fully autonomous reaction discovery platforms. A machine learning model and graph generator were able to accurately predict for the presence of nearly 1000 substructures and the connectivity of small organic molecules from experimental 1D NMR data.</description><identifier>ISSN: 2041-6520</identifier><identifier>EISSN: 2041-6539</identifier><identifier>DOI: 10.1039/d1sc04105c</identifier><identifier>PMID: 34976353</identifier><language>eng</language><publisher>England: Royal Society of Chemistry</publisher><subject>Algorithms ; Automation ; Chemistry ; Hydrogen atoms ; Isomers ; Machine learning ; NMR spectroscopy ; Organic chemistry ; Ranking ; Spectra ; Spectrum analysis</subject><ispartof>Chemical science (Cambridge), 2021-12, Vol.12 (46), p.15329-15338</ispartof><rights>This journal is © The Royal Society of Chemistry.</rights><rights>Copyright Royal Society of Chemistry 2021</rights><rights>This journal is © The Royal Society of Chemistry 2021 The Royal Society of Chemistry</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c521t-5f9f772807c92649704e952db24405624d6098e746c63f54de37d7b9ea8ca7233</citedby><cites>FETCH-LOGICAL-c521t-5f9f772807c92649704e952db24405624d6098e746c63f54de37d7b9ea8ca7233</cites><orcidid>0000-0003-0767-4238 ; 0000-0002-2747-0518 ; 0000-0002-4601-6222 ; 0000-0003-3463-600X ; 0000-0002-5932-6289 ; 0000000307674238 ; 000000033463600X ; 0000000227470518 ; 0000000259326289 ; 0000000246016222</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC8635205/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC8635205/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,727,780,784,864,885,27924,27925,53791,53793</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/34976353$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink><backlink>$$Uhttps://www.osti.gov/biblio/1830445$$D View this record in Osti.gov$$Hfree_for_read</backlink></links><search><creatorcontrib>Huang, Zhaorui</creatorcontrib><creatorcontrib>Chen, Michael S</creatorcontrib><creatorcontrib>Woroch, Cristian P</creatorcontrib><creatorcontrib>Markland, Thomas E</creatorcontrib><creatorcontrib>Kanan, Matthew W</creatorcontrib><title>A framework for automated structure elucidation from routine NMR spectra</title><title>Chemical science (Cambridge)</title><addtitle>Chem Sci</addtitle><description>Methods to automate structure elucidation that can be applied broadly across chemical structure space have the potential to greatly accelerate chemical discovery. NMR spectroscopy is the most widely used and arguably the most powerful method for elucidating structures of organic molecules. Here we introduce a machine learning (ML) framework that provides a quantitative probabilistic ranking of the most likely structural connectivity of an unknown compound when given routine, experimental one dimensional 1 H and/or 13 C NMR spectra. In particular, our ML-based algorithm takes input NMR spectra and (i) predicts the presence of specific substructures out of hundreds of substructures it has learned to identify; (ii) annotates the spectrum to label peaks with predicted substructures; and (iii) uses the substructures to construct candidate constitutional isomers and assign to them a probabilistic ranking. Using experimental spectra and molecular formulae for molecules containing up to 10 non-hydrogen atoms, the correct constitutional isomer was the highest-ranking prediction made by our model in 67.4% of the cases and one of the top-ten predictions in 95.8% of the cases. This advance will aid in solving the structure of unknown compounds, and thus further the development of automated structure elucidation tools that could enable the creation of fully autonomous reaction discovery platforms. A machine learning model and graph generator were able to accurately predict for the presence of nearly 1000 substructures and the connectivity of small organic molecules from experimental 1D NMR data.</description><subject>Algorithms</subject><subject>Automation</subject><subject>Chemistry</subject><subject>Hydrogen atoms</subject><subject>Isomers</subject><subject>Machine learning</subject><subject>NMR spectroscopy</subject><subject>Organic chemistry</subject><subject>Ranking</subject><subject>Spectra</subject><subject>Spectrum analysis</subject><issn>2041-6520</issn><issn>2041-6539</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><recordid>eNpdks1PFTEUxRuiAYJs2EMmujEmD_rd6caEPEFIABOVddPXuQOFmemzHxr-e4sPHmo3vcn95dx7eorQHsGHBDN91JHkMCdYuA20TWs1k4LpV-ua4i20m9IdrocxIqjaRFuMayWZYNvo7Ljpox3hV4j3TR9iY0sOo83QNSnH4nKJ0MBQnO9s9mGqdBibGEr2EzRXl1-btASXo32DXvd2SLD7dO-g69OT7_Oz2cWXz-fz44uZE5Tkmeh1rxRtsXKayroG5qAF7RaUcywk5Z3EugXFpZOsF7wDpjq10GBbZxVlbAd9XOkuy2KEzsFUhw9mGf1o44MJ1pt_O5O_NTfhp2mrYYpFFXi7Eggpe5Ocz-BuXZimasOQlmHOH6H3T1Ni-FEgZTP65GAY7AShJEMlkbQVmMqKvvsPvQslTvUNKoW5ppiKtlIfVpSLIaUI_Xpjgs1jkOYT-Tb_E-S8wgd_e1yjz7FVYH8FxOTW3ZefwH4DUWigzQ</recordid><startdate>20211201</startdate><enddate>20211201</enddate><creator>Huang, Zhaorui</creator><creator>Chen, Michael S</creator><creator>Woroch, Cristian P</creator><creator>Markland, Thomas E</creator><creator>Kanan, Matthew W</creator><general>Royal Society of Chemistry</general><general>Royal Society of Chemistry (RSC)</general><general>The Royal Society of Chemistry</general><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>7X8</scope><scope>OTOTI</scope><scope>5PM</scope><orcidid>https://orcid.org/0000-0003-0767-4238</orcidid><orcidid>https://orcid.org/0000-0002-2747-0518</orcidid><orcidid>https://orcid.org/0000-0002-4601-6222</orcidid><orcidid>https://orcid.org/0000-0003-3463-600X</orcidid><orcidid>https://orcid.org/0000-0002-5932-6289</orcidid><orcidid>https://orcid.org/0000000307674238</orcidid><orcidid>https://orcid.org/000000033463600X</orcidid><orcidid>https://orcid.org/0000000227470518</orcidid><orcidid>https://orcid.org/0000000259326289</orcidid><orcidid>https://orcid.org/0000000246016222</orcidid></search><sort><creationdate>20211201</creationdate><title>A framework for automated structure elucidation from routine NMR spectra</title><author>Huang, Zhaorui ; Chen, Michael S ; Woroch, Cristian P ; Markland, Thomas E ; Kanan, Matthew W</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c521t-5f9f772807c92649704e952db24405624d6098e746c63f54de37d7b9ea8ca7233</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Algorithms</topic><topic>Automation</topic><topic>Chemistry</topic><topic>Hydrogen atoms</topic><topic>Isomers</topic><topic>Machine learning</topic><topic>NMR spectroscopy</topic><topic>Organic chemistry</topic><topic>Ranking</topic><topic>Spectra</topic><topic>Spectrum analysis</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Huang, Zhaorui</creatorcontrib><creatorcontrib>Chen, Michael S</creatorcontrib><creatorcontrib>Woroch, Cristian P</creatorcontrib><creatorcontrib>Markland, Thomas E</creatorcontrib><creatorcontrib>Kanan, Matthew W</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>MEDLINE - Academic</collection><collection>OSTI.GOV</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Chemical science (Cambridge)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Huang, Zhaorui</au><au>Chen, Michael S</au><au>Woroch, Cristian P</au><au>Markland, Thomas E</au><au>Kanan, Matthew W</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A framework for automated structure elucidation from routine NMR spectra</atitle><jtitle>Chemical science (Cambridge)</jtitle><addtitle>Chem Sci</addtitle><date>2021-12-01</date><risdate>2021</risdate><volume>12</volume><issue>46</issue><spage>15329</spage><epage>15338</epage><pages>15329-15338</pages><issn>2041-6520</issn><eissn>2041-6539</eissn><abstract>Methods to automate structure elucidation that can be applied broadly across chemical structure space have the potential to greatly accelerate chemical discovery. NMR spectroscopy is the most widely used and arguably the most powerful method for elucidating structures of organic molecules. Here we introduce a machine learning (ML) framework that provides a quantitative probabilistic ranking of the most likely structural connectivity of an unknown compound when given routine, experimental one dimensional 1 H and/or 13 C NMR spectra. In particular, our ML-based algorithm takes input NMR spectra and (i) predicts the presence of specific substructures out of hundreds of substructures it has learned to identify; (ii) annotates the spectrum to label peaks with predicted substructures; and (iii) uses the substructures to construct candidate constitutional isomers and assign to them a probabilistic ranking. Using experimental spectra and molecular formulae for molecules containing up to 10 non-hydrogen atoms, the correct constitutional isomer was the highest-ranking prediction made by our model in 67.4% of the cases and one of the top-ten predictions in 95.8% of the cases. This advance will aid in solving the structure of unknown compounds, and thus further the development of automated structure elucidation tools that could enable the creation of fully autonomous reaction discovery platforms. A machine learning model and graph generator were able to accurately predict for the presence of nearly 1000 substructures and the connectivity of small organic molecules from experimental 1D NMR data.</abstract><cop>England</cop><pub>Royal Society of Chemistry</pub><pmid>34976353</pmid><doi>10.1039/d1sc04105c</doi><tpages>1</tpages><orcidid>https://orcid.org/0000-0003-0767-4238</orcidid><orcidid>https://orcid.org/0000-0002-2747-0518</orcidid><orcidid>https://orcid.org/0000-0002-4601-6222</orcidid><orcidid>https://orcid.org/0000-0003-3463-600X</orcidid><orcidid>https://orcid.org/0000-0002-5932-6289</orcidid><orcidid>https://orcid.org/0000000307674238</orcidid><orcidid>https://orcid.org/000000033463600X</orcidid><orcidid>https://orcid.org/0000000227470518</orcidid><orcidid>https://orcid.org/0000000259326289</orcidid><orcidid>https://orcid.org/0000000246016222</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2041-6520
ispartof Chemical science (Cambridge), 2021-12, Vol.12 (46), p.15329-15338
issn 2041-6520
2041-6539
language eng
recordid cdi_proquest_miscellaneous_2616285026
source DOAJ Directory of Open Access Journals; PubMed Central Open Access; EZB-FREE-00999 freely available EZB journals; PubMed Central
subjects Algorithms
Automation
Chemistry
Hydrogen atoms
Isomers
Machine learning
NMR spectroscopy
Organic chemistry
Ranking
Spectra
Spectrum analysis
title A framework for automated structure elucidation from routine NMR spectra
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-24T18%3A49%3A16IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20framework%20for%20automated%20structure%20elucidation%20from%20routine%20NMR%20spectra&rft.jtitle=Chemical%20science%20(Cambridge)&rft.au=Huang,%20Zhaorui&rft.date=2021-12-01&rft.volume=12&rft.issue=46&rft.spage=15329&rft.epage=15338&rft.pages=15329-15338&rft.issn=2041-6520&rft.eissn=2041-6539&rft_id=info:doi/10.1039/d1sc04105c&rft_dat=%3Cproquest_cross%3E2616285026%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2604920258&rft_id=info:pmid/34976353&rfr_iscdi=true