Constructing Regression Models with High Prediction Accuracy and Interpretability Based on Decision Tree and Random Forests
Models for predicting properties/activities of materials based on machine learning can lead to the discovery of new mechanisms underlying properties/activities of materials. However, methods for constructing models that exhibit both high prediction accuracy and interpretability remain a work in prog...
Gespeichert in:
Veröffentlicht in: | Journal of Computer Chemistry, Japan Japan, 2021, Vol.20(2), pp.71-87 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 87 |
---|---|
container_issue | 2 |
container_start_page | 71 |
container_title | Journal of Computer Chemistry, Japan |
container_volume | 20 |
creator | SHIMIZU, Naoto KANEKO, Hiromasa |
description | Models for predicting properties/activities of materials based on machine learning can lead to the discovery of new mechanisms underlying properties/activities of materials. However, methods for constructing models that exhibit both high prediction accuracy and interpretability remain a work in progress because the prediction accuracy and interpretability exhibit a trade-off relationship. In this study, we propose a new model-construction method that combines decision tree (DT) with random forests (RF); which we therefore call DT-RF. In DT-RF, the datasets to be analyzed are divided by a DT model, and RF models are constructed for each subdataset. This enables global interpretation of the data based on the DT model, while the RT models improve the prediction accuracy and enable local interpretations. Case studies were performed using three datasets, namely, those containing data on the boiling point of compounds, their water solubility, and the transition temperature of inorganic superconductors. We examined the proposed method in terms of its validity, prediction accuracy, and interpretability. |
doi_str_mv | 10.2477/jccj.2020-0021 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2578517234</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2578517234</sourcerecordid><originalsourceid>FETCH-LOGICAL-c364t-ba17c588ae0f1bbd73e8144f37a107fa32b5d03e9105bde893d094e3bf2be63d3</originalsourceid><addsrcrecordid>eNo9kEtrAjEUhYfSQsW67TrQ9dg8Zsy4tFqrYGkRuw553NHIOGOTDEX65xsfdXPvhfOde-AkySPBfZpx_rzVetunmOIUY0pukg5hGU9ZQbPby034gN8nPe-twhjzHJN82El-x03tg2t1sPUaLWHtIBJNjd4bA5VHPzZs0MyuN-jTgbERi9pI69ZJfUCyNmheB3B7B0EqW9lwQC_Sg0ERm4C2p18rB3Bil3E0OzRtYkrwD8ldKSsPvcvuJl_T19V4li4-3ubj0SLVbJCFVEnCdV4UEnBJlDKcQUGyrGRcEsxLyajKDWYwJDhXBoohM3iYAVMlVTBghnWTp_PfvWu-25gstk3r6hgpaM6LnHDKskj1z5R2jfcOSrF3difdQRAsjh2LY8fi2LE4dhwNk7Nh64NcwxWXLlhdwT8u6Gn8266y3kgnoGZ_jGqJhA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2578517234</pqid></control><display><type>article</type><title>Constructing Regression Models with High Prediction Accuracy and Interpretability Based on Decision Tree and Random Forests</title><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><source>J-STAGE (Japan Science & Technology Information Aggregator, Electronic) Freely Available Titles - Japanese</source><source>Free Full-Text Journals in Chemistry</source><creator>SHIMIZU, Naoto ; KANEKO, Hiromasa</creator><creatorcontrib>SHIMIZU, Naoto ; KANEKO, Hiromasa</creatorcontrib><description>Models for predicting properties/activities of materials based on machine learning can lead to the discovery of new mechanisms underlying properties/activities of materials. However, methods for constructing models that exhibit both high prediction accuracy and interpretability remain a work in progress because the prediction accuracy and interpretability exhibit a trade-off relationship. In this study, we propose a new model-construction method that combines decision tree (DT) with random forests (RF); which we therefore call DT-RF. In DT-RF, the datasets to be analyzed are divided by a DT model, and RF models are constructed for each subdataset. This enables global interpretation of the data based on the DT model, while the RT models improve the prediction accuracy and enable local interpretations. Case studies were performed using three datasets, namely, those containing data on the boiling point of compounds, their water solubility, and the transition temperature of inorganic superconductors. We examined the proposed method in terms of its validity, prediction accuracy, and interpretability.</description><identifier>ISSN: 1347-1767</identifier><identifier>EISSN: 1347-3824</identifier><identifier>DOI: 10.2477/jccj.2020-0021</identifier><language>eng</language><publisher>Tokyo: Society of Computer Chemistry, Japan</publisher><subject>Accuracy ; Boiling points ; Datasets ; Decision tree ; Decision trees ; Machine learning ; Model interpretability ; Predictive ability ; Random forests ; Regression model ; Regression models ; Superconductors ; Transition temperature ; Workflow</subject><ispartof>Journal of Computer Chemistry, Japan, 2021, Vol.20(2), pp.71-87</ispartof><rights>2021 Society of Computer Chemistry, Japan</rights><rights>Copyright Japan Science and Technology Agency 2021</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c364t-ba17c588ae0f1bbd73e8144f37a107fa32b5d03e9105bde893d094e3bf2be63d3</citedby><cites>FETCH-LOGICAL-c364t-ba17c588ae0f1bbd73e8144f37a107fa32b5d03e9105bde893d094e3bf2be63d3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,778,782,1879,4012,27912,27913,27914</link.rule.ids></links><search><creatorcontrib>SHIMIZU, Naoto</creatorcontrib><creatorcontrib>KANEKO, Hiromasa</creatorcontrib><title>Constructing Regression Models with High Prediction Accuracy and Interpretability Based on Decision Tree and Random Forests</title><title>Journal of Computer Chemistry, Japan</title><description>Models for predicting properties/activities of materials based on machine learning can lead to the discovery of new mechanisms underlying properties/activities of materials. However, methods for constructing models that exhibit both high prediction accuracy and interpretability remain a work in progress because the prediction accuracy and interpretability exhibit a trade-off relationship. In this study, we propose a new model-construction method that combines decision tree (DT) with random forests (RF); which we therefore call DT-RF. In DT-RF, the datasets to be analyzed are divided by a DT model, and RF models are constructed for each subdataset. This enables global interpretation of the data based on the DT model, while the RT models improve the prediction accuracy and enable local interpretations. Case studies were performed using three datasets, namely, those containing data on the boiling point of compounds, their water solubility, and the transition temperature of inorganic superconductors. We examined the proposed method in terms of its validity, prediction accuracy, and interpretability.</description><subject>Accuracy</subject><subject>Boiling points</subject><subject>Datasets</subject><subject>Decision tree</subject><subject>Decision trees</subject><subject>Machine learning</subject><subject>Model interpretability</subject><subject>Predictive ability</subject><subject>Random forests</subject><subject>Regression model</subject><subject>Regression models</subject><subject>Superconductors</subject><subject>Transition temperature</subject><subject>Workflow</subject><issn>1347-1767</issn><issn>1347-3824</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><recordid>eNo9kEtrAjEUhYfSQsW67TrQ9dg8Zsy4tFqrYGkRuw553NHIOGOTDEX65xsfdXPvhfOde-AkySPBfZpx_rzVetunmOIUY0pukg5hGU9ZQbPby034gN8nPe-twhjzHJN82El-x03tg2t1sPUaLWHtIBJNjd4bA5VHPzZs0MyuN-jTgbERi9pI69ZJfUCyNmheB3B7B0EqW9lwQC_Sg0ERm4C2p18rB3Bil3E0OzRtYkrwD8ldKSsPvcvuJl_T19V4li4-3ubj0SLVbJCFVEnCdV4UEnBJlDKcQUGyrGRcEsxLyajKDWYwJDhXBoohM3iYAVMlVTBghnWTp_PfvWu-25gstk3r6hgpaM6LnHDKskj1z5R2jfcOSrF3difdQRAsjh2LY8fi2LE4dhwNk7Nh64NcwxWXLlhdwT8u6Gn8266y3kgnoGZ_jGqJhA</recordid><startdate>2021</startdate><enddate>2021</enddate><creator>SHIMIZU, Naoto</creator><creator>KANEKO, Hiromasa</creator><general>Society of Computer Chemistry, Japan</general><general>Japan Science and Technology Agency</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>2021</creationdate><title>Constructing Regression Models with High Prediction Accuracy and Interpretability Based on Decision Tree and Random Forests</title><author>SHIMIZU, Naoto ; KANEKO, Hiromasa</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c364t-ba17c588ae0f1bbd73e8144f37a107fa32b5d03e9105bde893d094e3bf2be63d3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Accuracy</topic><topic>Boiling points</topic><topic>Datasets</topic><topic>Decision tree</topic><topic>Decision trees</topic><topic>Machine learning</topic><topic>Model interpretability</topic><topic>Predictive ability</topic><topic>Random forests</topic><topic>Regression model</topic><topic>Regression models</topic><topic>Superconductors</topic><topic>Transition temperature</topic><topic>Workflow</topic><toplevel>online_resources</toplevel><creatorcontrib>SHIMIZU, Naoto</creatorcontrib><creatorcontrib>KANEKO, Hiromasa</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Journal of Computer Chemistry, Japan</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>SHIMIZU, Naoto</au><au>KANEKO, Hiromasa</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Constructing Regression Models with High Prediction Accuracy and Interpretability Based on Decision Tree and Random Forests</atitle><jtitle>Journal of Computer Chemistry, Japan</jtitle><date>2021</date><risdate>2021</risdate><volume>20</volume><issue>2</issue><spage>71</spage><epage>87</epage><pages>71-87</pages><artnum>2020-0021</artnum><issn>1347-1767</issn><eissn>1347-3824</eissn><abstract>Models for predicting properties/activities of materials based on machine learning can lead to the discovery of new mechanisms underlying properties/activities of materials. However, methods for constructing models that exhibit both high prediction accuracy and interpretability remain a work in progress because the prediction accuracy and interpretability exhibit a trade-off relationship. In this study, we propose a new model-construction method that combines decision tree (DT) with random forests (RF); which we therefore call DT-RF. In DT-RF, the datasets to be analyzed are divided by a DT model, and RF models are constructed for each subdataset. This enables global interpretation of the data based on the DT model, while the RT models improve the prediction accuracy and enable local interpretations. Case studies were performed using three datasets, namely, those containing data on the boiling point of compounds, their water solubility, and the transition temperature of inorganic superconductors. We examined the proposed method in terms of its validity, prediction accuracy, and interpretability.</abstract><cop>Tokyo</cop><pub>Society of Computer Chemistry, Japan</pub><doi>10.2477/jccj.2020-0021</doi><tpages>17</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1347-1767 |
ispartof | Journal of Computer Chemistry, Japan, 2021, Vol.20(2), pp.71-87 |
issn | 1347-1767 1347-3824 |
language | eng |
recordid | cdi_proquest_journals_2578517234 |
source | Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals; J-STAGE (Japan Science & Technology Information Aggregator, Electronic) Freely Available Titles - Japanese; Free Full-Text Journals in Chemistry |
subjects | Accuracy Boiling points Datasets Decision tree Decision trees Machine learning Model interpretability Predictive ability Random forests Regression model Regression models Superconductors Transition temperature Workflow |
title | Constructing Regression Models with High Prediction Accuracy and Interpretability Based on Decision Tree and Random Forests |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-15T08%3A10%3A20IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Constructing%20Regression%20Models%20with%20High%20Prediction%20Accuracy%20and%20Interpretability%20Based%20on%20Decision%20Tree%20and%20Random%20Forests&rft.jtitle=Journal%20of%20Computer%20Chemistry,%20Japan&rft.au=SHIMIZU,%20Naoto&rft.date=2021&rft.volume=20&rft.issue=2&rft.spage=71&rft.epage=87&rft.pages=71-87&rft.artnum=2020-0021&rft.issn=1347-1767&rft.eissn=1347-3824&rft_id=info:doi/10.2477/jccj.2020-0021&rft_dat=%3Cproquest_cross%3E2578517234%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2578517234&rft_id=info:pmid/&rfr_iscdi=true |