Machine learning dihydrogen activation in the chemical space surrounding Vaska's complex

Homogeneous catalysis using transition metal complexes is ubiquitously used for organic synthesis, as well as technologically relevant in applications such as water splitting and CO 2 reduction. The key steps underlying homogeneous catalysis require a specific combination of electronic and steric ef...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Chemical science (Cambridge) 2020-05, Vol.11 (18), p.4584-461
Hauptverfasser: Friederich, Pascal, dos Passos Gomes, Gabriel, De Bin, Riccardo, Aspuru-Guzik, Alán, Balcells, David
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 461
container_issue 18
container_start_page 4584
container_title Chemical science (Cambridge)
container_volume 11
creator Friederich, Pascal
dos Passos Gomes, Gabriel
De Bin, Riccardo
Aspuru-Guzik, Alán
Balcells, David
description Homogeneous catalysis using transition metal complexes is ubiquitously used for organic synthesis, as well as technologically relevant in applications such as water splitting and CO 2 reduction. The key steps underlying homogeneous catalysis require a specific combination of electronic and steric effects from the ligands bound to the metal center. Finding the optimal combination of ligands is a challenging task due to the exceedingly large number of possibilities and the non-trivial ligand-ligand interactions. The classic example of Vaska's complex, trans -[Ir(PPh 3 ) 2 (CO)(Cl)], illustrates this scenario. The ligands of this species activate iridium for the oxidative addition of hydrogen, yielding the dihydride cis -[Ir(H) 2 (PPh 3 ) 2 (CO)(Cl)] complex. Despite the simplicity of this system, thousands of derivatives can be formulated for the activation of H 2 , with a limited number of ligands belonging to the same general categories found in the original complex. In this work, we show how DFT and machine learning (ML) methods can be combined to enable the prediction of reactivity within large chemical spaces containing thousands of complexes. In a space of 2574 species derived from Vaska's complex, data from DFT calculations are used to train and test ML models that predict the H 2 -activation barrier. In contrast to experiments and calculations requiring several days to be completed, the ML models were trained and used on a laptop on a time-scale of minutes. As a first approach, we combined Bayesian-optimized artificial neural networks (ANN) with features derived from autocorrelation and deltametric functions. The resulting ANNs achieved high accuracies, with mean absolute errors (MAE) between 1 and 2 kcal mol −1 , depending on the size of the training set. By using a Gaussian process (GP) model trained with a set of selected features, including fingerprints, accuracy was further enhanced. Remarkably, this GP model minimized the MAE below 1 kcal mol −1 , by using only 20% or less of the data available for training. The gradient boosting (GB) method was also used to assess the relevance of the features, which was used for both feature selection and model interpretation purposes. Features accounting for chemical composition, atom size and electronegativity were found to be the most determinant in the predictions. Further, the ligand fragments with the strongest influence on the H 2 -activation barrier were identified. A machine learning exploration of t
doi_str_mv 10.1039/d0sc00445f
format Article
fullrecord <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_7659707</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2463602761</sourcerecordid><originalsourceid>FETCH-LOGICAL-c491t-9672395cd34d93fdeeed40fba73143c10933db2307a6643b47e1adb3c28dbae43</originalsourceid><addsrcrecordid>eNp9kc1v1DAQxS0EolXphTtgTiCkBdvjxOsLEtq2gFTEgQ9xsxx7sjEkdmonFf3vSdl2gQtzmZHeb96M9Ah5yNlLzkC_8qw4xqSs2jvkUDDJV3UF-u5-FuyAHJfynS0FwCuh7pMDACGWFX1Ivn2wrgsRaY82xxC31Ifuyue0xUitm8KlnUKKNEQ6dUhdh0NwtqdltA5pmXNOc_TXe19t-WGfFerSMPb48wG519q-4PFNPyJfzk4_b96tzj--fb95c75yUvNppWslQFfOg_QaWo-IXrK2sQq4BMeZBvCNAKZsXUtopEJufQNOrH1jUcIReb3zHedmQO8wTtn2ZsxhsPnKJBvMv0oMndmmS6PqSiumFoMnOwOXQ5lCNDFlazhbV8KspZLrhXh-cyKnixnLZIZQHPa9jZjmYoSsoWZC1XxBX9yapVIytvtHODPXeZkT9mnzO6-zBX789-t79DadBXi0A3Jxe_VP4Iv-9H-6GX0LvwDqX6YB</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2463602761</pqid></control><display><type>article</type><title>Machine learning dihydrogen activation in the chemical space surrounding Vaska's complex</title><source>NORA - Norwegian Open Research Archives</source><source>DOAJ Directory of Open Access Journals</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><source>PubMed Central</source><source>PubMed Central Open Access</source><creator>Friederich, Pascal ; dos Passos Gomes, Gabriel ; De Bin, Riccardo ; Aspuru-Guzik, Alán ; Balcells, David</creator><creatorcontrib>Friederich, Pascal ; dos Passos Gomes, Gabriel ; De Bin, Riccardo ; Aspuru-Guzik, Alán ; Balcells, David</creatorcontrib><description>Homogeneous catalysis using transition metal complexes is ubiquitously used for organic synthesis, as well as technologically relevant in applications such as water splitting and CO 2 reduction. The key steps underlying homogeneous catalysis require a specific combination of electronic and steric effects from the ligands bound to the metal center. Finding the optimal combination of ligands is a challenging task due to the exceedingly large number of possibilities and the non-trivial ligand-ligand interactions. The classic example of Vaska's complex, trans -[Ir(PPh 3 ) 2 (CO)(Cl)], illustrates this scenario. The ligands of this species activate iridium for the oxidative addition of hydrogen, yielding the dihydride cis -[Ir(H) 2 (PPh 3 ) 2 (CO)(Cl)] complex. Despite the simplicity of this system, thousands of derivatives can be formulated for the activation of H 2 , with a limited number of ligands belonging to the same general categories found in the original complex. In this work, we show how DFT and machine learning (ML) methods can be combined to enable the prediction of reactivity within large chemical spaces containing thousands of complexes. In a space of 2574 species derived from Vaska's complex, data from DFT calculations are used to train and test ML models that predict the H 2 -activation barrier. In contrast to experiments and calculations requiring several days to be completed, the ML models were trained and used on a laptop on a time-scale of minutes. As a first approach, we combined Bayesian-optimized artificial neural networks (ANN) with features derived from autocorrelation and deltametric functions. The resulting ANNs achieved high accuracies, with mean absolute errors (MAE) between 1 and 2 kcal mol −1 , depending on the size of the training set. By using a Gaussian process (GP) model trained with a set of selected features, including fingerprints, accuracy was further enhanced. Remarkably, this GP model minimized the MAE below 1 kcal mol −1 , by using only 20% or less of the data available for training. The gradient boosting (GB) method was also used to assess the relevance of the features, which was used for both feature selection and model interpretation purposes. Features accounting for chemical composition, atom size and electronegativity were found to be the most determinant in the predictions. Further, the ligand fragments with the strongest influence on the H 2 -activation barrier were identified. A machine learning exploration of the chemical space surrounding Vaska's complex.</description><identifier>ISSN: 2041-6520</identifier><identifier>EISSN: 2041-6539</identifier><identifier>DOI: 10.1039/d0sc00445f</identifier><identifier>PMID: 33224459</identifier><language>eng</language><publisher>England: Royal Society of Chemistry</publisher><subject>Chemistry</subject><ispartof>Chemical science (Cambridge), 2020-05, Vol.11 (18), p.4584-461</ispartof><rights>This journal is © The Royal Society of Chemistry 2020.</rights><rights>info:eu-repo/semantics/openAccess</rights><rights>This journal is © The Royal Society of Chemistry 2020 2020</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c491t-9672395cd34d93fdeeed40fba73143c10933db2307a6643b47e1adb3c28dbae43</citedby><cites>FETCH-LOGICAL-c491t-9672395cd34d93fdeeed40fba73143c10933db2307a6643b47e1adb3c28dbae43</cites><orcidid>0000-0003-4465-1465 ; 0000-0002-8235-5969 ; 0000-0002-3389-0543</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC7659707/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC7659707/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,723,776,780,860,881,26544,27901,27902,53766,53768</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/33224459$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Friederich, Pascal</creatorcontrib><creatorcontrib>dos Passos Gomes, Gabriel</creatorcontrib><creatorcontrib>De Bin, Riccardo</creatorcontrib><creatorcontrib>Aspuru-Guzik, Alán</creatorcontrib><creatorcontrib>Balcells, David</creatorcontrib><title>Machine learning dihydrogen activation in the chemical space surrounding Vaska's complex</title><title>Chemical science (Cambridge)</title><addtitle>Chem Sci</addtitle><description>Homogeneous catalysis using transition metal complexes is ubiquitously used for organic synthesis, as well as technologically relevant in applications such as water splitting and CO 2 reduction. The key steps underlying homogeneous catalysis require a specific combination of electronic and steric effects from the ligands bound to the metal center. Finding the optimal combination of ligands is a challenging task due to the exceedingly large number of possibilities and the non-trivial ligand-ligand interactions. The classic example of Vaska's complex, trans -[Ir(PPh 3 ) 2 (CO)(Cl)], illustrates this scenario. The ligands of this species activate iridium for the oxidative addition of hydrogen, yielding the dihydride cis -[Ir(H) 2 (PPh 3 ) 2 (CO)(Cl)] complex. Despite the simplicity of this system, thousands of derivatives can be formulated for the activation of H 2 , with a limited number of ligands belonging to the same general categories found in the original complex. In this work, we show how DFT and machine learning (ML) methods can be combined to enable the prediction of reactivity within large chemical spaces containing thousands of complexes. In a space of 2574 species derived from Vaska's complex, data from DFT calculations are used to train and test ML models that predict the H 2 -activation barrier. In contrast to experiments and calculations requiring several days to be completed, the ML models were trained and used on a laptop on a time-scale of minutes. As a first approach, we combined Bayesian-optimized artificial neural networks (ANN) with features derived from autocorrelation and deltametric functions. The resulting ANNs achieved high accuracies, with mean absolute errors (MAE) between 1 and 2 kcal mol −1 , depending on the size of the training set. By using a Gaussian process (GP) model trained with a set of selected features, including fingerprints, accuracy was further enhanced. Remarkably, this GP model minimized the MAE below 1 kcal mol −1 , by using only 20% or less of the data available for training. The gradient boosting (GB) method was also used to assess the relevance of the features, which was used for both feature selection and model interpretation purposes. Features accounting for chemical composition, atom size and electronegativity were found to be the most determinant in the predictions. Further, the ligand fragments with the strongest influence on the H 2 -activation barrier were identified. A machine learning exploration of the chemical space surrounding Vaska's complex.</description><subject>Chemistry</subject><issn>2041-6520</issn><issn>2041-6539</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>3HK</sourceid><recordid>eNp9kc1v1DAQxS0EolXphTtgTiCkBdvjxOsLEtq2gFTEgQ9xsxx7sjEkdmonFf3vSdl2gQtzmZHeb96M9Ah5yNlLzkC_8qw4xqSs2jvkUDDJV3UF-u5-FuyAHJfynS0FwCuh7pMDACGWFX1Ivn2wrgsRaY82xxC31Ifuyue0xUitm8KlnUKKNEQ6dUhdh0NwtqdltA5pmXNOc_TXe19t-WGfFerSMPb48wG519q-4PFNPyJfzk4_b96tzj--fb95c75yUvNppWslQFfOg_QaWo-IXrK2sQq4BMeZBvCNAKZsXUtopEJufQNOrH1jUcIReb3zHedmQO8wTtn2ZsxhsPnKJBvMv0oMndmmS6PqSiumFoMnOwOXQ5lCNDFlazhbV8KspZLrhXh-cyKnixnLZIZQHPa9jZjmYoSsoWZC1XxBX9yapVIytvtHODPXeZkT9mnzO6-zBX789-t79DadBXi0A3Jxe_VP4Iv-9H-6GX0LvwDqX6YB</recordid><startdate>20200514</startdate><enddate>20200514</enddate><creator>Friederich, Pascal</creator><creator>dos Passos Gomes, Gabriel</creator><creator>De Bin, Riccardo</creator><creator>Aspuru-Guzik, Alán</creator><creator>Balcells, David</creator><general>Royal Society of Chemistry</general><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>3HK</scope><scope>5PM</scope><orcidid>https://orcid.org/0000-0003-4465-1465</orcidid><orcidid>https://orcid.org/0000-0002-8235-5969</orcidid><orcidid>https://orcid.org/0000-0002-3389-0543</orcidid></search><sort><creationdate>20200514</creationdate><title>Machine learning dihydrogen activation in the chemical space surrounding Vaska's complex</title><author>Friederich, Pascal ; dos Passos Gomes, Gabriel ; De Bin, Riccardo ; Aspuru-Guzik, Alán ; Balcells, David</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c491t-9672395cd34d93fdeeed40fba73143c10933db2307a6643b47e1adb3c28dbae43</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Chemistry</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Friederich, Pascal</creatorcontrib><creatorcontrib>dos Passos Gomes, Gabriel</creatorcontrib><creatorcontrib>De Bin, Riccardo</creatorcontrib><creatorcontrib>Aspuru-Guzik, Alán</creatorcontrib><creatorcontrib>Balcells, David</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>NORA - Norwegian Open Research Archives</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Chemical science (Cambridge)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Friederich, Pascal</au><au>dos Passos Gomes, Gabriel</au><au>De Bin, Riccardo</au><au>Aspuru-Guzik, Alán</au><au>Balcells, David</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Machine learning dihydrogen activation in the chemical space surrounding Vaska's complex</atitle><jtitle>Chemical science (Cambridge)</jtitle><addtitle>Chem Sci</addtitle><date>2020-05-14</date><risdate>2020</risdate><volume>11</volume><issue>18</issue><spage>4584</spage><epage>461</epage><pages>4584-461</pages><issn>2041-6520</issn><eissn>2041-6539</eissn><abstract>Homogeneous catalysis using transition metal complexes is ubiquitously used for organic synthesis, as well as technologically relevant in applications such as water splitting and CO 2 reduction. The key steps underlying homogeneous catalysis require a specific combination of electronic and steric effects from the ligands bound to the metal center. Finding the optimal combination of ligands is a challenging task due to the exceedingly large number of possibilities and the non-trivial ligand-ligand interactions. The classic example of Vaska's complex, trans -[Ir(PPh 3 ) 2 (CO)(Cl)], illustrates this scenario. The ligands of this species activate iridium for the oxidative addition of hydrogen, yielding the dihydride cis -[Ir(H) 2 (PPh 3 ) 2 (CO)(Cl)] complex. Despite the simplicity of this system, thousands of derivatives can be formulated for the activation of H 2 , with a limited number of ligands belonging to the same general categories found in the original complex. In this work, we show how DFT and machine learning (ML) methods can be combined to enable the prediction of reactivity within large chemical spaces containing thousands of complexes. In a space of 2574 species derived from Vaska's complex, data from DFT calculations are used to train and test ML models that predict the H 2 -activation barrier. In contrast to experiments and calculations requiring several days to be completed, the ML models were trained and used on a laptop on a time-scale of minutes. As a first approach, we combined Bayesian-optimized artificial neural networks (ANN) with features derived from autocorrelation and deltametric functions. The resulting ANNs achieved high accuracies, with mean absolute errors (MAE) between 1 and 2 kcal mol −1 , depending on the size of the training set. By using a Gaussian process (GP) model trained with a set of selected features, including fingerprints, accuracy was further enhanced. Remarkably, this GP model minimized the MAE below 1 kcal mol −1 , by using only 20% or less of the data available for training. The gradient boosting (GB) method was also used to assess the relevance of the features, which was used for both feature selection and model interpretation purposes. Features accounting for chemical composition, atom size and electronegativity were found to be the most determinant in the predictions. Further, the ligand fragments with the strongest influence on the H 2 -activation barrier were identified. A machine learning exploration of the chemical space surrounding Vaska's complex.</abstract><cop>England</cop><pub>Royal Society of Chemistry</pub><pmid>33224459</pmid><doi>10.1039/d0sc00445f</doi><tpages>18</tpages><orcidid>https://orcid.org/0000-0003-4465-1465</orcidid><orcidid>https://orcid.org/0000-0002-8235-5969</orcidid><orcidid>https://orcid.org/0000-0002-3389-0543</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2041-6520
ispartof Chemical science (Cambridge), 2020-05, Vol.11 (18), p.4584-461
issn 2041-6520
2041-6539
language eng
recordid cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_7659707
source NORA - Norwegian Open Research Archives; DOAJ Directory of Open Access Journals; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals; PubMed Central; PubMed Central Open Access
subjects Chemistry
title Machine learning dihydrogen activation in the chemical space surrounding Vaska's complex
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-05T04%3A11%3A59IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Machine%20learning%20dihydrogen%20activation%20in%20the%20chemical%20space%20surrounding%20Vaska's%20complex&rft.jtitle=Chemical%20science%20(Cambridge)&rft.au=Friederich,%20Pascal&rft.date=2020-05-14&rft.volume=11&rft.issue=18&rft.spage=4584&rft.epage=461&rft.pages=4584-461&rft.issn=2041-6520&rft.eissn=2041-6539&rft_id=info:doi/10.1039/d0sc00445f&rft_dat=%3Cproquest_pubme%3E2463602761%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2463602761&rft_id=info:pmid/33224459&rfr_iscdi=true