A framework for automated structure elucidation from routine NMR spectra
Methods to automate structure elucidation that can be applied broadly across chemical structure space have the potential to greatly accelerate chemical discovery. NMR spectroscopy is the most widely used and arguably the most powerful method for elucidating structures of organic molecules. Here we i...
Gespeichert in:
Veröffentlicht in: | Chemical science (Cambridge) 2021-12, Vol.12 (46), p.15329-15338 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 15338 |
---|---|
container_issue | 46 |
container_start_page | 15329 |
container_title | Chemical science (Cambridge) |
container_volume | 12 |
creator | Huang, Zhaorui Chen, Michael S Woroch, Cristian P Markland, Thomas E Kanan, Matthew W |
description | Methods to automate structure elucidation that can be applied broadly across chemical structure space have the potential to greatly accelerate chemical discovery. NMR spectroscopy is the most widely used and arguably the most powerful method for elucidating structures of organic molecules. Here we introduce a machine learning (ML) framework that provides a quantitative probabilistic ranking of the most likely structural connectivity of an unknown compound when given routine, experimental one dimensional
1
H and/or
13
C NMR spectra. In particular, our ML-based algorithm takes input NMR spectra and (i) predicts the presence of specific substructures out of hundreds of substructures it has learned to identify; (ii) annotates the spectrum to label peaks with predicted substructures; and (iii) uses the substructures to construct candidate constitutional isomers and assign to them a probabilistic ranking. Using experimental spectra and molecular formulae for molecules containing up to 10 non-hydrogen atoms, the correct constitutional isomer was the highest-ranking prediction made by our model in 67.4% of the cases and one of the top-ten predictions in 95.8% of the cases. This advance will aid in solving the structure of unknown compounds, and thus further the development of automated structure elucidation tools that could enable the creation of fully autonomous reaction discovery platforms.
A machine learning model and graph generator were able to accurately predict for the presence of nearly 1000 substructures and the connectivity of small organic molecules from experimental 1D NMR data. |
doi_str_mv | 10.1039/d1sc04105c |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_2616285026</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2616285026</sourcerecordid><originalsourceid>FETCH-LOGICAL-c521t-5f9f772807c92649704e952db24405624d6098e746c63f54de37d7b9ea8ca7233</originalsourceid><addsrcrecordid>eNpdks1PFTEUxRuiAYJs2EMmujEmD_rd6caEPEFIABOVddPXuQOFmemzHxr-e4sPHmo3vcn95dx7eorQHsGHBDN91JHkMCdYuA20TWs1k4LpV-ua4i20m9IdrocxIqjaRFuMayWZYNvo7Ljpox3hV4j3TR9iY0sOo83QNSnH4nKJ0MBQnO9s9mGqdBibGEr2EzRXl1-btASXo32DXvd2SLD7dO-g69OT7_Oz2cWXz-fz44uZE5Tkmeh1rxRtsXKayroG5qAF7RaUcywk5Z3EugXFpZOsF7wDpjq10GBbZxVlbAd9XOkuy2KEzsFUhw9mGf1o44MJ1pt_O5O_NTfhp2mrYYpFFXi7Eggpe5Ocz-BuXZimasOQlmHOH6H3T1Ni-FEgZTP65GAY7AShJEMlkbQVmMqKvvsPvQslTvUNKoW5ppiKtlIfVpSLIaUI_Xpjgs1jkOYT-Tb_E-S8wgd_e1yjz7FVYH8FxOTW3ZefwH4DUWigzQ</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2604920258</pqid></control><display><type>article</type><title>A framework for automated structure elucidation from routine NMR spectra</title><source>DOAJ Directory of Open Access Journals</source><source>PubMed Central Open Access</source><source>EZB-FREE-00999 freely available EZB journals</source><source>PubMed Central</source><creator>Huang, Zhaorui ; Chen, Michael S ; Woroch, Cristian P ; Markland, Thomas E ; Kanan, Matthew W</creator><creatorcontrib>Huang, Zhaorui ; Chen, Michael S ; Woroch, Cristian P ; Markland, Thomas E ; Kanan, Matthew W</creatorcontrib><description>Methods to automate structure elucidation that can be applied broadly across chemical structure space have the potential to greatly accelerate chemical discovery. NMR spectroscopy is the most widely used and arguably the most powerful method for elucidating structures of organic molecules. Here we introduce a machine learning (ML) framework that provides a quantitative probabilistic ranking of the most likely structural connectivity of an unknown compound when given routine, experimental one dimensional
1
H and/or
13
C NMR spectra. In particular, our ML-based algorithm takes input NMR spectra and (i) predicts the presence of specific substructures out of hundreds of substructures it has learned to identify; (ii) annotates the spectrum to label peaks with predicted substructures; and (iii) uses the substructures to construct candidate constitutional isomers and assign to them a probabilistic ranking. Using experimental spectra and molecular formulae for molecules containing up to 10 non-hydrogen atoms, the correct constitutional isomer was the highest-ranking prediction made by our model in 67.4% of the cases and one of the top-ten predictions in 95.8% of the cases. This advance will aid in solving the structure of unknown compounds, and thus further the development of automated structure elucidation tools that could enable the creation of fully autonomous reaction discovery platforms.
A machine learning model and graph generator were able to accurately predict for the presence of nearly 1000 substructures and the connectivity of small organic molecules from experimental 1D NMR data.</description><identifier>ISSN: 2041-6520</identifier><identifier>EISSN: 2041-6539</identifier><identifier>DOI: 10.1039/d1sc04105c</identifier><identifier>PMID: 34976353</identifier><language>eng</language><publisher>England: Royal Society of Chemistry</publisher><subject>Algorithms ; Automation ; Chemistry ; Hydrogen atoms ; Isomers ; Machine learning ; NMR spectroscopy ; Organic chemistry ; Ranking ; Spectra ; Spectrum analysis</subject><ispartof>Chemical science (Cambridge), 2021-12, Vol.12 (46), p.15329-15338</ispartof><rights>This journal is © The Royal Society of Chemistry.</rights><rights>Copyright Royal Society of Chemistry 2021</rights><rights>This journal is © The Royal Society of Chemistry 2021 The Royal Society of Chemistry</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c521t-5f9f772807c92649704e952db24405624d6098e746c63f54de37d7b9ea8ca7233</citedby><cites>FETCH-LOGICAL-c521t-5f9f772807c92649704e952db24405624d6098e746c63f54de37d7b9ea8ca7233</cites><orcidid>0000-0003-0767-4238 ; 0000-0002-2747-0518 ; 0000-0002-4601-6222 ; 0000-0003-3463-600X ; 0000-0002-5932-6289 ; 0000000307674238 ; 000000033463600X ; 0000000227470518 ; 0000000259326289 ; 0000000246016222</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC8635205/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC8635205/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,727,780,784,864,885,27924,27925,53791,53793</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/34976353$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink><backlink>$$Uhttps://www.osti.gov/biblio/1830445$$D View this record in Osti.gov$$Hfree_for_read</backlink></links><search><creatorcontrib>Huang, Zhaorui</creatorcontrib><creatorcontrib>Chen, Michael S</creatorcontrib><creatorcontrib>Woroch, Cristian P</creatorcontrib><creatorcontrib>Markland, Thomas E</creatorcontrib><creatorcontrib>Kanan, Matthew W</creatorcontrib><title>A framework for automated structure elucidation from routine NMR spectra</title><title>Chemical science (Cambridge)</title><addtitle>Chem Sci</addtitle><description>Methods to automate structure elucidation that can be applied broadly across chemical structure space have the potential to greatly accelerate chemical discovery. NMR spectroscopy is the most widely used and arguably the most powerful method for elucidating structures of organic molecules. Here we introduce a machine learning (ML) framework that provides a quantitative probabilistic ranking of the most likely structural connectivity of an unknown compound when given routine, experimental one dimensional
1
H and/or
13
C NMR spectra. In particular, our ML-based algorithm takes input NMR spectra and (i) predicts the presence of specific substructures out of hundreds of substructures it has learned to identify; (ii) annotates the spectrum to label peaks with predicted substructures; and (iii) uses the substructures to construct candidate constitutional isomers and assign to them a probabilistic ranking. Using experimental spectra and molecular formulae for molecules containing up to 10 non-hydrogen atoms, the correct constitutional isomer was the highest-ranking prediction made by our model in 67.4% of the cases and one of the top-ten predictions in 95.8% of the cases. This advance will aid in solving the structure of unknown compounds, and thus further the development of automated structure elucidation tools that could enable the creation of fully autonomous reaction discovery platforms.
A machine learning model and graph generator were able to accurately predict for the presence of nearly 1000 substructures and the connectivity of small organic molecules from experimental 1D NMR data.</description><subject>Algorithms</subject><subject>Automation</subject><subject>Chemistry</subject><subject>Hydrogen atoms</subject><subject>Isomers</subject><subject>Machine learning</subject><subject>NMR spectroscopy</subject><subject>Organic chemistry</subject><subject>Ranking</subject><subject>Spectra</subject><subject>Spectrum analysis</subject><issn>2041-6520</issn><issn>2041-6539</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><recordid>eNpdks1PFTEUxRuiAYJs2EMmujEmD_rd6caEPEFIABOVddPXuQOFmemzHxr-e4sPHmo3vcn95dx7eorQHsGHBDN91JHkMCdYuA20TWs1k4LpV-ua4i20m9IdrocxIqjaRFuMayWZYNvo7Ljpox3hV4j3TR9iY0sOo83QNSnH4nKJ0MBQnO9s9mGqdBibGEr2EzRXl1-btASXo32DXvd2SLD7dO-g69OT7_Oz2cWXz-fz44uZE5Tkmeh1rxRtsXKayroG5qAF7RaUcywk5Z3EugXFpZOsF7wDpjq10GBbZxVlbAd9XOkuy2KEzsFUhw9mGf1o44MJ1pt_O5O_NTfhp2mrYYpFFXi7Eggpe5Ocz-BuXZimasOQlmHOH6H3T1Ni-FEgZTP65GAY7AShJEMlkbQVmMqKvvsPvQslTvUNKoW5ppiKtlIfVpSLIaUI_Xpjgs1jkOYT-Tb_E-S8wgd_e1yjz7FVYH8FxOTW3ZefwH4DUWigzQ</recordid><startdate>20211201</startdate><enddate>20211201</enddate><creator>Huang, Zhaorui</creator><creator>Chen, Michael S</creator><creator>Woroch, Cristian P</creator><creator>Markland, Thomas E</creator><creator>Kanan, Matthew W</creator><general>Royal Society of Chemistry</general><general>Royal Society of Chemistry (RSC)</general><general>The Royal Society of Chemistry</general><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>7X8</scope><scope>OTOTI</scope><scope>5PM</scope><orcidid>https://orcid.org/0000-0003-0767-4238</orcidid><orcidid>https://orcid.org/0000-0002-2747-0518</orcidid><orcidid>https://orcid.org/0000-0002-4601-6222</orcidid><orcidid>https://orcid.org/0000-0003-3463-600X</orcidid><orcidid>https://orcid.org/0000-0002-5932-6289</orcidid><orcidid>https://orcid.org/0000000307674238</orcidid><orcidid>https://orcid.org/000000033463600X</orcidid><orcidid>https://orcid.org/0000000227470518</orcidid><orcidid>https://orcid.org/0000000259326289</orcidid><orcidid>https://orcid.org/0000000246016222</orcidid></search><sort><creationdate>20211201</creationdate><title>A framework for automated structure elucidation from routine NMR spectra</title><author>Huang, Zhaorui ; Chen, Michael S ; Woroch, Cristian P ; Markland, Thomas E ; Kanan, Matthew W</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c521t-5f9f772807c92649704e952db24405624d6098e746c63f54de37d7b9ea8ca7233</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Algorithms</topic><topic>Automation</topic><topic>Chemistry</topic><topic>Hydrogen atoms</topic><topic>Isomers</topic><topic>Machine learning</topic><topic>NMR spectroscopy</topic><topic>Organic chemistry</topic><topic>Ranking</topic><topic>Spectra</topic><topic>Spectrum analysis</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Huang, Zhaorui</creatorcontrib><creatorcontrib>Chen, Michael S</creatorcontrib><creatorcontrib>Woroch, Cristian P</creatorcontrib><creatorcontrib>Markland, Thomas E</creatorcontrib><creatorcontrib>Kanan, Matthew W</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>MEDLINE - Academic</collection><collection>OSTI.GOV</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Chemical science (Cambridge)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Huang, Zhaorui</au><au>Chen, Michael S</au><au>Woroch, Cristian P</au><au>Markland, Thomas E</au><au>Kanan, Matthew W</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A framework for automated structure elucidation from routine NMR spectra</atitle><jtitle>Chemical science (Cambridge)</jtitle><addtitle>Chem Sci</addtitle><date>2021-12-01</date><risdate>2021</risdate><volume>12</volume><issue>46</issue><spage>15329</spage><epage>15338</epage><pages>15329-15338</pages><issn>2041-6520</issn><eissn>2041-6539</eissn><abstract>Methods to automate structure elucidation that can be applied broadly across chemical structure space have the potential to greatly accelerate chemical discovery. NMR spectroscopy is the most widely used and arguably the most powerful method for elucidating structures of organic molecules. Here we introduce a machine learning (ML) framework that provides a quantitative probabilistic ranking of the most likely structural connectivity of an unknown compound when given routine, experimental one dimensional
1
H and/or
13
C NMR spectra. In particular, our ML-based algorithm takes input NMR spectra and (i) predicts the presence of specific substructures out of hundreds of substructures it has learned to identify; (ii) annotates the spectrum to label peaks with predicted substructures; and (iii) uses the substructures to construct candidate constitutional isomers and assign to them a probabilistic ranking. Using experimental spectra and molecular formulae for molecules containing up to 10 non-hydrogen atoms, the correct constitutional isomer was the highest-ranking prediction made by our model in 67.4% of the cases and one of the top-ten predictions in 95.8% of the cases. This advance will aid in solving the structure of unknown compounds, and thus further the development of automated structure elucidation tools that could enable the creation of fully autonomous reaction discovery platforms.
A machine learning model and graph generator were able to accurately predict for the presence of nearly 1000 substructures and the connectivity of small organic molecules from experimental 1D NMR data.</abstract><cop>England</cop><pub>Royal Society of Chemistry</pub><pmid>34976353</pmid><doi>10.1039/d1sc04105c</doi><tpages>1</tpages><orcidid>https://orcid.org/0000-0003-0767-4238</orcidid><orcidid>https://orcid.org/0000-0002-2747-0518</orcidid><orcidid>https://orcid.org/0000-0002-4601-6222</orcidid><orcidid>https://orcid.org/0000-0003-3463-600X</orcidid><orcidid>https://orcid.org/0000-0002-5932-6289</orcidid><orcidid>https://orcid.org/0000000307674238</orcidid><orcidid>https://orcid.org/000000033463600X</orcidid><orcidid>https://orcid.org/0000000227470518</orcidid><orcidid>https://orcid.org/0000000259326289</orcidid><orcidid>https://orcid.org/0000000246016222</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 2041-6520 |
ispartof | Chemical science (Cambridge), 2021-12, Vol.12 (46), p.15329-15338 |
issn | 2041-6520 2041-6539 |
language | eng |
recordid | cdi_proquest_miscellaneous_2616285026 |
source | DOAJ Directory of Open Access Journals; PubMed Central Open Access; EZB-FREE-00999 freely available EZB journals; PubMed Central |
subjects | Algorithms Automation Chemistry Hydrogen atoms Isomers Machine learning NMR spectroscopy Organic chemistry Ranking Spectra Spectrum analysis |
title | A framework for automated structure elucidation from routine NMR spectra |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-24T18%3A49%3A16IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20framework%20for%20automated%20structure%20elucidation%20from%20routine%20NMR%20spectra&rft.jtitle=Chemical%20science%20(Cambridge)&rft.au=Huang,%20Zhaorui&rft.date=2021-12-01&rft.volume=12&rft.issue=46&rft.spage=15329&rft.epage=15338&rft.pages=15329-15338&rft.issn=2041-6520&rft.eissn=2041-6539&rft_id=info:doi/10.1039/d1sc04105c&rft_dat=%3Cproquest_cross%3E2616285026%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2604920258&rft_id=info:pmid/34976353&rfr_iscdi=true |