A Machine Learning and Explainable AI Framework Tailored for Unbalanced Experimental Catalyst Discovery

The successful application of machine learning (ML) in catalyst design has been made difficult by the challenges associated with collecting high-quality and diverse data. Due to the complex interactions between catalyst components, the design of novel catalysts has long relied on trial-and-error, a...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of physical chemistry. C 2024-12, Vol.128 (50), p.21349-21367
Hauptverfasser: Semnani, Parastoo, Bogojeski, Mihail, Bley, Florian, Zhang, Zizheng, Wu, Qiong, Kneib, Thomas, Herrmann, Jan, Weisser, Christoph, Patcas, Florina, Müller, Klaus-Robert
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 21367
container_issue 50
container_start_page 21349
container_title Journal of physical chemistry. C
container_volume 128
creator Semnani, Parastoo
Bogojeski, Mihail
Bley, Florian
Zhang, Zizheng
Wu, Qiong
Kneib, Thomas
Herrmann, Jan
Weisser, Christoph
Patcas, Florina
Müller, Klaus-Robert
description The successful application of machine learning (ML) in catalyst design has been made difficult by the challenges associated with collecting high-quality and diverse data. Due to the complex interactions between catalyst components, the design of novel catalysts has long relied on trial-and-error, a costly and labor-intensive process that results in scarce data that is heavily biased toward undesired, low-yield catalysts. Such data presents a challenge for training ML models that generalize well to novel compositions, which is necessary for the success of ML-guided catalyst discovery. Despite the growing popularity of ML applications in this field, most efforts so far have not focused on dealing with the challenges presented by such experimental data. In this work, we introduce a robust ML and explainable artificial intelligence (XAI) framework that incorporates a series of well-established ML methods designed to improve model performance and provide reliable evaluations for catalytic yield classification in the context of scarce and class-imbalanced data. We apply this framework to classify the yields of different catalyst combinations in the oxidative coupling of methane reaction and use it to evaluate the performance of a range of ML models: tree-based models (such as decision trees, random forest, and gradient boosted trees), logistic regression, support vector machines, and neural networks. Our experiments demonstrate that the methods used in our framework lead to more robust performance estimates and reduce the effect of class imbalance on model training, resulting in significant improvements in the predictive capability of all but one of the evaluated models. Additionally, the XAI component of the framework analyzes the decision-making process of each ML model by identifying the most important features for predicting catalyst performance. Our analysis found that XAI methods that provide class-aware explanations, such as Layer-wise Relevance Propagation, managed to identify key components that contribute specifically to high-yield catalysts. These findings align with chemical intuition and existing literature, reinforcing their validity. We believe this framework can serve as a blueprint and a set of best practices for ML applications in catalysis, driving future research while delivering robust models and actionable insights that can assist chemists in designing and discovering novel catalysts with superior performance.
doi_str_mv 10.1021/acs.jpcc.4c05332
format Article
fullrecord <record><control><sourceid>acs_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1021_acs_jpcc_4c05332</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>d112994531</sourcerecordid><originalsourceid>FETCH-LOGICAL-a1561-ca140dcb29e1caf098dfacd670e2e0ccb26361ea96eff8289f47fb63d5cae42d3</originalsourceid><addsrcrecordid>eNp1kElPAkEQhTtGExG9e-wf4GAvszBHgqAkGC9wntRUV-Pg0EO6ceHf2yzx5qWqkvde5eVj7F6KgRRKPgKGwXqLOEhRZFqrC9aTpVZJkWbZ5d-dFtfsJoS1iB4hdY-tRvwV8L1xxOcE3jVuxcEZPvnZttA4qFvioxmfetjQd-c_-AKatvNkuO08X7oaWnBIxwD5ZkNuBy0fQ5z7sONPTcDui_z-ll1ZaAPdnXefLaeTxfglmb89z8ajeQIyy2WCIFNhsFYlSQQryqGxgCYvBCkSGIVc55KgzMnaoRqWNi1snWuTIVCqjO4zcfqLvgvBk622sRX4fSVFdQBVRVDVAVR1BhUjD6fIUek-vYsF_7f_ArKJbqY</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>A Machine Learning and Explainable AI Framework Tailored for Unbalanced Experimental Catalyst Discovery</title><source>ACS Publications</source><creator>Semnani, Parastoo ; Bogojeski, Mihail ; Bley, Florian ; Zhang, Zizheng ; Wu, Qiong ; Kneib, Thomas ; Herrmann, Jan ; Weisser, Christoph ; Patcas, Florina ; Müller, Klaus-Robert</creator><creatorcontrib>Semnani, Parastoo ; Bogojeski, Mihail ; Bley, Florian ; Zhang, Zizheng ; Wu, Qiong ; Kneib, Thomas ; Herrmann, Jan ; Weisser, Christoph ; Patcas, Florina ; Müller, Klaus-Robert</creatorcontrib><description>The successful application of machine learning (ML) in catalyst design has been made difficult by the challenges associated with collecting high-quality and diverse data. Due to the complex interactions between catalyst components, the design of novel catalysts has long relied on trial-and-error, a costly and labor-intensive process that results in scarce data that is heavily biased toward undesired, low-yield catalysts. Such data presents a challenge for training ML models that generalize well to novel compositions, which is necessary for the success of ML-guided catalyst discovery. Despite the growing popularity of ML applications in this field, most efforts so far have not focused on dealing with the challenges presented by such experimental data. In this work, we introduce a robust ML and explainable artificial intelligence (XAI) framework that incorporates a series of well-established ML methods designed to improve model performance and provide reliable evaluations for catalytic yield classification in the context of scarce and class-imbalanced data. We apply this framework to classify the yields of different catalyst combinations in the oxidative coupling of methane reaction and use it to evaluate the performance of a range of ML models: tree-based models (such as decision trees, random forest, and gradient boosted trees), logistic regression, support vector machines, and neural networks. Our experiments demonstrate that the methods used in our framework lead to more robust performance estimates and reduce the effect of class imbalance on model training, resulting in significant improvements in the predictive capability of all but one of the evaluated models. Additionally, the XAI component of the framework analyzes the decision-making process of each ML model by identifying the most important features for predicting catalyst performance. Our analysis found that XAI methods that provide class-aware explanations, such as Layer-wise Relevance Propagation, managed to identify key components that contribute specifically to high-yield catalysts. These findings align with chemical intuition and existing literature, reinforcing their validity. We believe this framework can serve as a blueprint and a set of best practices for ML applications in catalysis, driving future research while delivering robust models and actionable insights that can assist chemists in designing and discovering novel catalysts with superior performance.</description><identifier>ISSN: 1932-7447</identifier><identifier>EISSN: 1932-7455</identifier><identifier>DOI: 10.1021/acs.jpcc.4c05332</identifier><language>eng</language><publisher>American Chemical Society</publisher><subject>C: Chemical and Catalytic Reactivity at Interfaces</subject><ispartof>Journal of physical chemistry. C, 2024-12, Vol.128 (50), p.21349-21367</ispartof><rights>2024 The Authors. Published by American Chemical Society</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-a1561-ca140dcb29e1caf098dfacd670e2e0ccb26361ea96eff8289f47fb63d5cae42d3</cites><orcidid>0009-0002-6607-6756 ; 0000-0002-3861-7685 ; 0000-0002-1839-7320</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://pubs.acs.org/doi/pdf/10.1021/acs.jpcc.4c05332$$EPDF$$P50$$Gacs$$H</linktopdf><linktohtml>$$Uhttps://pubs.acs.org/doi/10.1021/acs.jpcc.4c05332$$EHTML$$P50$$Gacs$$H</linktohtml><link.rule.ids>314,780,784,2763,27075,27923,27924,56737,56787</link.rule.ids></links><search><creatorcontrib>Semnani, Parastoo</creatorcontrib><creatorcontrib>Bogojeski, Mihail</creatorcontrib><creatorcontrib>Bley, Florian</creatorcontrib><creatorcontrib>Zhang, Zizheng</creatorcontrib><creatorcontrib>Wu, Qiong</creatorcontrib><creatorcontrib>Kneib, Thomas</creatorcontrib><creatorcontrib>Herrmann, Jan</creatorcontrib><creatorcontrib>Weisser, Christoph</creatorcontrib><creatorcontrib>Patcas, Florina</creatorcontrib><creatorcontrib>Müller, Klaus-Robert</creatorcontrib><title>A Machine Learning and Explainable AI Framework Tailored for Unbalanced Experimental Catalyst Discovery</title><title>Journal of physical chemistry. C</title><addtitle>J. Phys. Chem. C</addtitle><description>The successful application of machine learning (ML) in catalyst design has been made difficult by the challenges associated with collecting high-quality and diverse data. Due to the complex interactions between catalyst components, the design of novel catalysts has long relied on trial-and-error, a costly and labor-intensive process that results in scarce data that is heavily biased toward undesired, low-yield catalysts. Such data presents a challenge for training ML models that generalize well to novel compositions, which is necessary for the success of ML-guided catalyst discovery. Despite the growing popularity of ML applications in this field, most efforts so far have not focused on dealing with the challenges presented by such experimental data. In this work, we introduce a robust ML and explainable artificial intelligence (XAI) framework that incorporates a series of well-established ML methods designed to improve model performance and provide reliable evaluations for catalytic yield classification in the context of scarce and class-imbalanced data. We apply this framework to classify the yields of different catalyst combinations in the oxidative coupling of methane reaction and use it to evaluate the performance of a range of ML models: tree-based models (such as decision trees, random forest, and gradient boosted trees), logistic regression, support vector machines, and neural networks. Our experiments demonstrate that the methods used in our framework lead to more robust performance estimates and reduce the effect of class imbalance on model training, resulting in significant improvements in the predictive capability of all but one of the evaluated models. Additionally, the XAI component of the framework analyzes the decision-making process of each ML model by identifying the most important features for predicting catalyst performance. Our analysis found that XAI methods that provide class-aware explanations, such as Layer-wise Relevance Propagation, managed to identify key components that contribute specifically to high-yield catalysts. These findings align with chemical intuition and existing literature, reinforcing their validity. We believe this framework can serve as a blueprint and a set of best practices for ML applications in catalysis, driving future research while delivering robust models and actionable insights that can assist chemists in designing and discovering novel catalysts with superior performance.</description><subject>C: Chemical and Catalytic Reactivity at Interfaces</subject><issn>1932-7447</issn><issn>1932-7455</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNp1kElPAkEQhTtGExG9e-wf4GAvszBHgqAkGC9wntRUV-Pg0EO6ceHf2yzx5qWqkvde5eVj7F6KgRRKPgKGwXqLOEhRZFqrC9aTpVZJkWbZ5d-dFtfsJoS1iB4hdY-tRvwV8L1xxOcE3jVuxcEZPvnZttA4qFvioxmfetjQd-c_-AKatvNkuO08X7oaWnBIxwD5ZkNuBy0fQ5z7sONPTcDui_z-ll1ZaAPdnXefLaeTxfglmb89z8ajeQIyy2WCIFNhsFYlSQQryqGxgCYvBCkSGIVc55KgzMnaoRqWNi1snWuTIVCqjO4zcfqLvgvBk622sRX4fSVFdQBVRVDVAVR1BhUjD6fIUek-vYsF_7f_ArKJbqY</recordid><startdate>20241219</startdate><enddate>20241219</enddate><creator>Semnani, Parastoo</creator><creator>Bogojeski, Mihail</creator><creator>Bley, Florian</creator><creator>Zhang, Zizheng</creator><creator>Wu, Qiong</creator><creator>Kneib, Thomas</creator><creator>Herrmann, Jan</creator><creator>Weisser, Christoph</creator><creator>Patcas, Florina</creator><creator>Müller, Klaus-Robert</creator><general>American Chemical Society</general><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0009-0002-6607-6756</orcidid><orcidid>https://orcid.org/0000-0002-3861-7685</orcidid><orcidid>https://orcid.org/0000-0002-1839-7320</orcidid></search><sort><creationdate>20241219</creationdate><title>A Machine Learning and Explainable AI Framework Tailored for Unbalanced Experimental Catalyst Discovery</title><author>Semnani, Parastoo ; Bogojeski, Mihail ; Bley, Florian ; Zhang, Zizheng ; Wu, Qiong ; Kneib, Thomas ; Herrmann, Jan ; Weisser, Christoph ; Patcas, Florina ; Müller, Klaus-Robert</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a1561-ca140dcb29e1caf098dfacd670e2e0ccb26361ea96eff8289f47fb63d5cae42d3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>C: Chemical and Catalytic Reactivity at Interfaces</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Semnani, Parastoo</creatorcontrib><creatorcontrib>Bogojeski, Mihail</creatorcontrib><creatorcontrib>Bley, Florian</creatorcontrib><creatorcontrib>Zhang, Zizheng</creatorcontrib><creatorcontrib>Wu, Qiong</creatorcontrib><creatorcontrib>Kneib, Thomas</creatorcontrib><creatorcontrib>Herrmann, Jan</creatorcontrib><creatorcontrib>Weisser, Christoph</creatorcontrib><creatorcontrib>Patcas, Florina</creatorcontrib><creatorcontrib>Müller, Klaus-Robert</creatorcontrib><collection>CrossRef</collection><jtitle>Journal of physical chemistry. C</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Semnani, Parastoo</au><au>Bogojeski, Mihail</au><au>Bley, Florian</au><au>Zhang, Zizheng</au><au>Wu, Qiong</au><au>Kneib, Thomas</au><au>Herrmann, Jan</au><au>Weisser, Christoph</au><au>Patcas, Florina</au><au>Müller, Klaus-Robert</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A Machine Learning and Explainable AI Framework Tailored for Unbalanced Experimental Catalyst Discovery</atitle><jtitle>Journal of physical chemistry. C</jtitle><addtitle>J. Phys. Chem. C</addtitle><date>2024-12-19</date><risdate>2024</risdate><volume>128</volume><issue>50</issue><spage>21349</spage><epage>21367</epage><pages>21349-21367</pages><issn>1932-7447</issn><eissn>1932-7455</eissn><abstract>The successful application of machine learning (ML) in catalyst design has been made difficult by the challenges associated with collecting high-quality and diverse data. Due to the complex interactions between catalyst components, the design of novel catalysts has long relied on trial-and-error, a costly and labor-intensive process that results in scarce data that is heavily biased toward undesired, low-yield catalysts. Such data presents a challenge for training ML models that generalize well to novel compositions, which is necessary for the success of ML-guided catalyst discovery. Despite the growing popularity of ML applications in this field, most efforts so far have not focused on dealing with the challenges presented by such experimental data. In this work, we introduce a robust ML and explainable artificial intelligence (XAI) framework that incorporates a series of well-established ML methods designed to improve model performance and provide reliable evaluations for catalytic yield classification in the context of scarce and class-imbalanced data. We apply this framework to classify the yields of different catalyst combinations in the oxidative coupling of methane reaction and use it to evaluate the performance of a range of ML models: tree-based models (such as decision trees, random forest, and gradient boosted trees), logistic regression, support vector machines, and neural networks. Our experiments demonstrate that the methods used in our framework lead to more robust performance estimates and reduce the effect of class imbalance on model training, resulting in significant improvements in the predictive capability of all but one of the evaluated models. Additionally, the XAI component of the framework analyzes the decision-making process of each ML model by identifying the most important features for predicting catalyst performance. Our analysis found that XAI methods that provide class-aware explanations, such as Layer-wise Relevance Propagation, managed to identify key components that contribute specifically to high-yield catalysts. These findings align with chemical intuition and existing literature, reinforcing their validity. We believe this framework can serve as a blueprint and a set of best practices for ML applications in catalysis, driving future research while delivering robust models and actionable insights that can assist chemists in designing and discovering novel catalysts with superior performance.</abstract><pub>American Chemical Society</pub><doi>10.1021/acs.jpcc.4c05332</doi><tpages>19</tpages><orcidid>https://orcid.org/0009-0002-6607-6756</orcidid><orcidid>https://orcid.org/0000-0002-3861-7685</orcidid><orcidid>https://orcid.org/0000-0002-1839-7320</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1932-7447
ispartof Journal of physical chemistry. C, 2024-12, Vol.128 (50), p.21349-21367
issn 1932-7447
1932-7455
language eng
recordid cdi_crossref_primary_10_1021_acs_jpcc_4c05332
source ACS Publications
subjects C: Chemical and Catalytic Reactivity at Interfaces
title A Machine Learning and Explainable AI Framework Tailored for Unbalanced Experimental Catalyst Discovery
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-12T13%3A54%3A50IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-acs_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20Machine%20Learning%20and%20Explainable%20AI%20Framework%20Tailored%20for%20Unbalanced%20Experimental%20Catalyst%20Discovery&rft.jtitle=Journal%20of%20physical%20chemistry.%20C&rft.au=Semnani,%20Parastoo&rft.date=2024-12-19&rft.volume=128&rft.issue=50&rft.spage=21349&rft.epage=21367&rft.pages=21349-21367&rft.issn=1932-7447&rft.eissn=1932-7455&rft_id=info:doi/10.1021/acs.jpcc.4c05332&rft_dat=%3Cacs_cross%3Ed112994531%3C/acs_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true