Constructing Regression Models with High Prediction Accuracy and Interpretability Based on Decision Tree and Random Forests

Models for predicting properties/activities of materials based on machine learning can lead to the discovery of new mechanisms underlying properties/activities of materials. However, methods for constructing models that exhibit both high prediction accuracy and interpretability remain a work in prog...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of Computer Chemistry, Japan Japan, 2021, Vol.20(2), pp.71-87
Hauptverfasser: SHIMIZU, Naoto, KANEKO, Hiromasa
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 87
container_issue 2
container_start_page 71
container_title Journal of Computer Chemistry, Japan
container_volume 20
creator SHIMIZU, Naoto
KANEKO, Hiromasa
description Models for predicting properties/activities of materials based on machine learning can lead to the discovery of new mechanisms underlying properties/activities of materials. However, methods for constructing models that exhibit both high prediction accuracy and interpretability remain a work in progress because the prediction accuracy and interpretability exhibit a trade-off relationship. In this study, we propose a new model-construction method that combines decision tree (DT) with random forests (RF); which we therefore call DT-RF. In DT-RF, the datasets to be analyzed are divided by a DT model, and RF models are constructed for each subdataset. This enables global interpretation of the data based on the DT model, while the RT models improve the prediction accuracy and enable local interpretations. Case studies were performed using three datasets, namely, those containing data on the boiling point of compounds, their water solubility, and the transition temperature of inorganic superconductors. We examined the proposed method in terms of its validity, prediction accuracy, and interpretability.
doi_str_mv 10.2477/jccj.2020-0021
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2578517234</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2578517234</sourcerecordid><originalsourceid>FETCH-LOGICAL-c364t-ba17c588ae0f1bbd73e8144f37a107fa32b5d03e9105bde893d094e3bf2be63d3</originalsourceid><addsrcrecordid>eNo9kEtrAjEUhYfSQsW67TrQ9dg8Zsy4tFqrYGkRuw553NHIOGOTDEX65xsfdXPvhfOde-AkySPBfZpx_rzVetunmOIUY0pukg5hGU9ZQbPby034gN8nPe-twhjzHJN82El-x03tg2t1sPUaLWHtIBJNjd4bA5VHPzZs0MyuN-jTgbERi9pI69ZJfUCyNmheB3B7B0EqW9lwQC_Sg0ERm4C2p18rB3Bil3E0OzRtYkrwD8ldKSsPvcvuJl_T19V4li4-3ubj0SLVbJCFVEnCdV4UEnBJlDKcQUGyrGRcEsxLyajKDWYwJDhXBoohM3iYAVMlVTBghnWTp_PfvWu-25gstk3r6hgpaM6LnHDKskj1z5R2jfcOSrF3difdQRAsjh2LY8fi2LE4dhwNk7Nh64NcwxWXLlhdwT8u6Gn8266y3kgnoGZ_jGqJhA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2578517234</pqid></control><display><type>article</type><title>Constructing Regression Models with High Prediction Accuracy and Interpretability Based on Decision Tree and Random Forests</title><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><source>J-STAGE (Japan Science &amp; Technology Information Aggregator, Electronic) Freely Available Titles - Japanese</source><source>Free Full-Text Journals in Chemistry</source><creator>SHIMIZU, Naoto ; KANEKO, Hiromasa</creator><creatorcontrib>SHIMIZU, Naoto ; KANEKO, Hiromasa</creatorcontrib><description>Models for predicting properties/activities of materials based on machine learning can lead to the discovery of new mechanisms underlying properties/activities of materials. However, methods for constructing models that exhibit both high prediction accuracy and interpretability remain a work in progress because the prediction accuracy and interpretability exhibit a trade-off relationship. In this study, we propose a new model-construction method that combines decision tree (DT) with random forests (RF); which we therefore call DT-RF. In DT-RF, the datasets to be analyzed are divided by a DT model, and RF models are constructed for each subdataset. This enables global interpretation of the data based on the DT model, while the RT models improve the prediction accuracy and enable local interpretations. Case studies were performed using three datasets, namely, those containing data on the boiling point of compounds, their water solubility, and the transition temperature of inorganic superconductors. We examined the proposed method in terms of its validity, prediction accuracy, and interpretability.</description><identifier>ISSN: 1347-1767</identifier><identifier>EISSN: 1347-3824</identifier><identifier>DOI: 10.2477/jccj.2020-0021</identifier><language>eng</language><publisher>Tokyo: Society of Computer Chemistry, Japan</publisher><subject>Accuracy ; Boiling points ; Datasets ; Decision tree ; Decision trees ; Machine learning ; Model interpretability ; Predictive ability ; Random forests ; Regression model ; Regression models ; Superconductors ; Transition temperature ; Workflow</subject><ispartof>Journal of Computer Chemistry, Japan, 2021, Vol.20(2), pp.71-87</ispartof><rights>2021 Society of Computer Chemistry, Japan</rights><rights>Copyright Japan Science and Technology Agency 2021</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c364t-ba17c588ae0f1bbd73e8144f37a107fa32b5d03e9105bde893d094e3bf2be63d3</citedby><cites>FETCH-LOGICAL-c364t-ba17c588ae0f1bbd73e8144f37a107fa32b5d03e9105bde893d094e3bf2be63d3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,778,782,1879,4012,27912,27913,27914</link.rule.ids></links><search><creatorcontrib>SHIMIZU, Naoto</creatorcontrib><creatorcontrib>KANEKO, Hiromasa</creatorcontrib><title>Constructing Regression Models with High Prediction Accuracy and Interpretability Based on Decision Tree and Random Forests</title><title>Journal of Computer Chemistry, Japan</title><description>Models for predicting properties/activities of materials based on machine learning can lead to the discovery of new mechanisms underlying properties/activities of materials. However, methods for constructing models that exhibit both high prediction accuracy and interpretability remain a work in progress because the prediction accuracy and interpretability exhibit a trade-off relationship. In this study, we propose a new model-construction method that combines decision tree (DT) with random forests (RF); which we therefore call DT-RF. In DT-RF, the datasets to be analyzed are divided by a DT model, and RF models are constructed for each subdataset. This enables global interpretation of the data based on the DT model, while the RT models improve the prediction accuracy and enable local interpretations. Case studies were performed using three datasets, namely, those containing data on the boiling point of compounds, their water solubility, and the transition temperature of inorganic superconductors. We examined the proposed method in terms of its validity, prediction accuracy, and interpretability.</description><subject>Accuracy</subject><subject>Boiling points</subject><subject>Datasets</subject><subject>Decision tree</subject><subject>Decision trees</subject><subject>Machine learning</subject><subject>Model interpretability</subject><subject>Predictive ability</subject><subject>Random forests</subject><subject>Regression model</subject><subject>Regression models</subject><subject>Superconductors</subject><subject>Transition temperature</subject><subject>Workflow</subject><issn>1347-1767</issn><issn>1347-3824</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><recordid>eNo9kEtrAjEUhYfSQsW67TrQ9dg8Zsy4tFqrYGkRuw553NHIOGOTDEX65xsfdXPvhfOde-AkySPBfZpx_rzVetunmOIUY0pukg5hGU9ZQbPby034gN8nPe-twhjzHJN82El-x03tg2t1sPUaLWHtIBJNjd4bA5VHPzZs0MyuN-jTgbERi9pI69ZJfUCyNmheB3B7B0EqW9lwQC_Sg0ERm4C2p18rB3Bil3E0OzRtYkrwD8ldKSsPvcvuJl_T19V4li4-3ubj0SLVbJCFVEnCdV4UEnBJlDKcQUGyrGRcEsxLyajKDWYwJDhXBoohM3iYAVMlVTBghnWTp_PfvWu-25gstk3r6hgpaM6LnHDKskj1z5R2jfcOSrF3difdQRAsjh2LY8fi2LE4dhwNk7Nh64NcwxWXLlhdwT8u6Gn8266y3kgnoGZ_jGqJhA</recordid><startdate>2021</startdate><enddate>2021</enddate><creator>SHIMIZU, Naoto</creator><creator>KANEKO, Hiromasa</creator><general>Society of Computer Chemistry, Japan</general><general>Japan Science and Technology Agency</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>2021</creationdate><title>Constructing Regression Models with High Prediction Accuracy and Interpretability Based on Decision Tree and Random Forests</title><author>SHIMIZU, Naoto ; KANEKO, Hiromasa</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c364t-ba17c588ae0f1bbd73e8144f37a107fa32b5d03e9105bde893d094e3bf2be63d3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Accuracy</topic><topic>Boiling points</topic><topic>Datasets</topic><topic>Decision tree</topic><topic>Decision trees</topic><topic>Machine learning</topic><topic>Model interpretability</topic><topic>Predictive ability</topic><topic>Random forests</topic><topic>Regression model</topic><topic>Regression models</topic><topic>Superconductors</topic><topic>Transition temperature</topic><topic>Workflow</topic><toplevel>online_resources</toplevel><creatorcontrib>SHIMIZU, Naoto</creatorcontrib><creatorcontrib>KANEKO, Hiromasa</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Journal of Computer Chemistry, Japan</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>SHIMIZU, Naoto</au><au>KANEKO, Hiromasa</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Constructing Regression Models with High Prediction Accuracy and Interpretability Based on Decision Tree and Random Forests</atitle><jtitle>Journal of Computer Chemistry, Japan</jtitle><date>2021</date><risdate>2021</risdate><volume>20</volume><issue>2</issue><spage>71</spage><epage>87</epage><pages>71-87</pages><artnum>2020-0021</artnum><issn>1347-1767</issn><eissn>1347-3824</eissn><abstract>Models for predicting properties/activities of materials based on machine learning can lead to the discovery of new mechanisms underlying properties/activities of materials. However, methods for constructing models that exhibit both high prediction accuracy and interpretability remain a work in progress because the prediction accuracy and interpretability exhibit a trade-off relationship. In this study, we propose a new model-construction method that combines decision tree (DT) with random forests (RF); which we therefore call DT-RF. In DT-RF, the datasets to be analyzed are divided by a DT model, and RF models are constructed for each subdataset. This enables global interpretation of the data based on the DT model, while the RT models improve the prediction accuracy and enable local interpretations. Case studies were performed using three datasets, namely, those containing data on the boiling point of compounds, their water solubility, and the transition temperature of inorganic superconductors. We examined the proposed method in terms of its validity, prediction accuracy, and interpretability.</abstract><cop>Tokyo</cop><pub>Society of Computer Chemistry, Japan</pub><doi>10.2477/jccj.2020-0021</doi><tpages>17</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1347-1767
ispartof Journal of Computer Chemistry, Japan, 2021, Vol.20(2), pp.71-87
issn 1347-1767
1347-3824
language eng
recordid cdi_proquest_journals_2578517234
source Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals; J-STAGE (Japan Science & Technology Information Aggregator, Electronic) Freely Available Titles - Japanese; Free Full-Text Journals in Chemistry
subjects Accuracy
Boiling points
Datasets
Decision tree
Decision trees
Machine learning
Model interpretability
Predictive ability
Random forests
Regression model
Regression models
Superconductors
Transition temperature
Workflow
title Constructing Regression Models with High Prediction Accuracy and Interpretability Based on Decision Tree and Random Forests
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-15T08%3A10%3A20IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Constructing%20Regression%20Models%20with%20High%20Prediction%20Accuracy%20and%20Interpretability%20Based%20on%20Decision%20Tree%20and%20Random%20Forests&rft.jtitle=Journal%20of%20Computer%20Chemistry,%20Japan&rft.au=SHIMIZU,%20Naoto&rft.date=2021&rft.volume=20&rft.issue=2&rft.spage=71&rft.epage=87&rft.pages=71-87&rft.artnum=2020-0021&rft.issn=1347-1767&rft.eissn=1347-3824&rft_id=info:doi/10.2477/jccj.2020-0021&rft_dat=%3Cproquest_cross%3E2578517234%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2578517234&rft_id=info:pmid/&rfr_iscdi=true