RGMDT: Return-Gap-Minimizing Decision Tree Extraction in Non-Euclidean Metric Space

Deep Reinforcement Learning (DRL) algorithms have achieved great success in solving many challenging tasks while their black-box nature hinders interpretability and real-world applicability, making it difficult for human experts to interpret and understand DRL policies. Existing works on interpretab...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Chen, Jingdi, Zhou, Hanhan, Mei, Yongsheng, Joe-Wong, Carlee, Adam, Gina, Bastian, Nathaniel D, Lan, Tian
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Artificial Intelligence Computer Science - Learning
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Chen, Jingdi Zhou, Hanhan Mei, Yongsheng Joe-Wong, Carlee Adam, Gina Bastian, Nathaniel D Lan, Tian
description	Deep Reinforcement Learning (DRL) algorithms have achieved great success in solving many challenging tasks while their black-box nature hinders interpretability and real-world applicability, making it difficult for human experts to interpret and understand DRL policies. Existing works on interpretable reinforcement learning have shown promise in extracting decision tree (DT) based policies from DRL policies with most focus on the single-agent settings while prior attempts to introduce DT policies in multi-agent scenarios mainly focus on heuristic designs which do not provide any quantitative guarantees on the expected return. In this paper, we establish an upper bound on the return gap between the oracle expert policy and an optimal decision tree policy. This enables us to recast the DT extraction problem into a novel non-euclidean clustering problem over the local observation and action values space of each agent, with action values as cluster labels and the upper bound on the return gap as clustering loss. Both the algorithm and the upper bound are extended to multi-agent decentralized DT extractions by an iteratively-grow-DT procedure guided by an action-value function conditioned on the current DTs of other agents. Further, we propose the Return-Gap-Minimization Decision Tree (RGMDT) algorithm, which is a surprisingly simple design and is integrated with reinforcement learning through the utilization of a novel Regularized Information Maximization loss. Evaluations on tasks like D4RL show that RGMDT significantly outperforms heuristic DT-based baselines and can achieve nearly optimal returns under given DT complexity constraints (e.g., maximum number of DT nodes).
doi_str_mv	10.48550/arxiv.2410.16517
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2410_16517</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2410_16517</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2410_165173</originalsourceid><addsrcrecordid>eNqFjrEOgkAQRK-xMOoHWLk_cAgKamwFscEC6MnmXM0mcJDjMOjXK8TeajKTl8wTYum5jn8IAneNpuens_G_g7cLvP1UZGmchPkRUrKd0TLGRiasueI36weEpLjlWkNuiCDqrUFlh84arrWWUadKvhFqSMgaVpA1qGguJncsW1r8ciZW5yg_XeR4XzSGKzSvYtAoRo3tf-IDI5A8hQ</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>RGMDT: Return-Gap-Minimizing Decision Tree Extraction in Non-Euclidean Metric Space</title><source>arXiv.org</source><creator>Chen, Jingdi ; Zhou, Hanhan ; Mei, Yongsheng ; Joe-Wong, Carlee ; Adam, Gina ; Bastian, Nathaniel D ; Lan, Tian</creator><creatorcontrib>Chen, Jingdi ; Zhou, Hanhan ; Mei, Yongsheng ; Joe-Wong, Carlee ; Adam, Gina ; Bastian, Nathaniel D ; Lan, Tian</creatorcontrib><description>Deep Reinforcement Learning (DRL) algorithms have achieved great success in solving many challenging tasks while their black-box nature hinders interpretability and real-world applicability, making it difficult for human experts to interpret and understand DRL policies. Existing works on interpretable reinforcement learning have shown promise in extracting decision tree (DT) based policies from DRL policies with most focus on the single-agent settings while prior attempts to introduce DT policies in multi-agent scenarios mainly focus on heuristic designs which do not provide any quantitative guarantees on the expected return. In this paper, we establish an upper bound on the return gap between the oracle expert policy and an optimal decision tree policy. This enables us to recast the DT extraction problem into a novel non-euclidean clustering problem over the local observation and action values space of each agent, with action values as cluster labels and the upper bound on the return gap as clustering loss. Both the algorithm and the upper bound are extended to multi-agent decentralized DT extractions by an iteratively-grow-DT procedure guided by an action-value function conditioned on the current DTs of other agents. Further, we propose the Return-Gap-Minimization Decision Tree (RGMDT) algorithm, which is a surprisingly simple design and is integrated with reinforcement learning through the utilization of a novel Regularized Information Maximization loss. Evaluations on tasks like D4RL show that RGMDT significantly outperforms heuristic DT-based baselines and can achieve nearly optimal returns under given DT complexity constraints (e.g., maximum number of DT nodes).</description><identifier>DOI: 10.48550/arxiv.2410.16517</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Learning</subject><creationdate>2024-10</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2410.16517$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2410.16517$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Chen, Jingdi</creatorcontrib><creatorcontrib>Zhou, Hanhan</creatorcontrib><creatorcontrib>Mei, Yongsheng</creatorcontrib><creatorcontrib>Joe-Wong, Carlee</creatorcontrib><creatorcontrib>Adam, Gina</creatorcontrib><creatorcontrib>Bastian, Nathaniel D</creatorcontrib><creatorcontrib>Lan, Tian</creatorcontrib><title>RGMDT: Return-Gap-Minimizing Decision Tree Extraction in Non-Euclidean Metric Space</title><description>Deep Reinforcement Learning (DRL) algorithms have achieved great success in solving many challenging tasks while their black-box nature hinders interpretability and real-world applicability, making it difficult for human experts to interpret and understand DRL policies. Existing works on interpretable reinforcement learning have shown promise in extracting decision tree (DT) based policies from DRL policies with most focus on the single-agent settings while prior attempts to introduce DT policies in multi-agent scenarios mainly focus on heuristic designs which do not provide any quantitative guarantees on the expected return. In this paper, we establish an upper bound on the return gap between the oracle expert policy and an optimal decision tree policy. This enables us to recast the DT extraction problem into a novel non-euclidean clustering problem over the local observation and action values space of each agent, with action values as cluster labels and the upper bound on the return gap as clustering loss. Both the algorithm and the upper bound are extended to multi-agent decentralized DT extractions by an iteratively-grow-DT procedure guided by an action-value function conditioned on the current DTs of other agents. Further, we propose the Return-Gap-Minimization Decision Tree (RGMDT) algorithm, which is a surprisingly simple design and is integrated with reinforcement learning through the utilization of a novel Regularized Information Maximization loss. Evaluations on tasks like D4RL show that RGMDT significantly outperforms heuristic DT-based baselines and can achieve nearly optimal returns under given DT complexity constraints (e.g., maximum number of DT nodes).</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNqFjrEOgkAQRK-xMOoHWLk_cAgKamwFscEC6MnmXM0mcJDjMOjXK8TeajKTl8wTYum5jn8IAneNpuens_G_g7cLvP1UZGmchPkRUrKd0TLGRiasueI36weEpLjlWkNuiCDqrUFlh84arrWWUadKvhFqSMgaVpA1qGguJncsW1r8ciZW5yg_XeR4XzSGKzSvYtAoRo3tf-IDI5A8hQ</recordid><startdate>20241021</startdate><enddate>20241021</enddate><creator>Chen, Jingdi</creator><creator>Zhou, Hanhan</creator><creator>Mei, Yongsheng</creator><creator>Joe-Wong, Carlee</creator><creator>Adam, Gina</creator><creator>Bastian, Nathaniel D</creator><creator>Lan, Tian</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20241021</creationdate><title>RGMDT: Return-Gap-Minimizing Decision Tree Extraction in Non-Euclidean Metric Space</title><author>Chen, Jingdi ; Zhou, Hanhan ; Mei, Yongsheng ; Joe-Wong, Carlee ; Adam, Gina ; Bastian, Nathaniel D ; Lan, Tian</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2410_165173</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Chen, Jingdi</creatorcontrib><creatorcontrib>Zhou, Hanhan</creatorcontrib><creatorcontrib>Mei, Yongsheng</creatorcontrib><creatorcontrib>Joe-Wong, Carlee</creatorcontrib><creatorcontrib>Adam, Gina</creatorcontrib><creatorcontrib>Bastian, Nathaniel D</creatorcontrib><creatorcontrib>Lan, Tian</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Chen, Jingdi</au><au>Zhou, Hanhan</au><au>Mei, Yongsheng</au><au>Joe-Wong, Carlee</au><au>Adam, Gina</au><au>Bastian, Nathaniel D</au><au>Lan, Tian</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>RGMDT: Return-Gap-Minimizing Decision Tree Extraction in Non-Euclidean Metric Space</atitle><date>2024-10-21</date><risdate>2024</risdate><abstract>Deep Reinforcement Learning (DRL) algorithms have achieved great success in solving many challenging tasks while their black-box nature hinders interpretability and real-world applicability, making it difficult for human experts to interpret and understand DRL policies. Existing works on interpretable reinforcement learning have shown promise in extracting decision tree (DT) based policies from DRL policies with most focus on the single-agent settings while prior attempts to introduce DT policies in multi-agent scenarios mainly focus on heuristic designs which do not provide any quantitative guarantees on the expected return. In this paper, we establish an upper bound on the return gap between the oracle expert policy and an optimal decision tree policy. This enables us to recast the DT extraction problem into a novel non-euclidean clustering problem over the local observation and action values space of each agent, with action values as cluster labels and the upper bound on the return gap as clustering loss. Both the algorithm and the upper bound are extended to multi-agent decentralized DT extractions by an iteratively-grow-DT procedure guided by an action-value function conditioned on the current DTs of other agents. Further, we propose the Return-Gap-Minimization Decision Tree (RGMDT) algorithm, which is a surprisingly simple design and is integrated with reinforcement learning through the utilization of a novel Regularized Information Maximization loss. Evaluations on tasks like D4RL show that RGMDT significantly outperforms heuristic DT-based baselines and can achieve nearly optimal returns under given DT complexity constraints (e.g., maximum number of DT nodes).</abstract><doi>10.48550/arxiv.2410.16517</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2410.16517
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2410_16517
source	arXiv.org
subjects	Computer Science - Artificial Intelligence Computer Science - Learning
title	RGMDT: Return-Gap-Minimizing Decision Tree Extraction in Non-Euclidean Metric Space
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-11T10%3A09%3A34IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=RGMDT:%20Return-Gap-Minimizing%20Decision%20Tree%20Extraction%20in%20Non-Euclidean%20Metric%20Space&rft.au=Chen,%20Jingdi&rft.date=2024-10-21&rft_id=info:doi/10.48550/arxiv.2410.16517&rft_dat=%3Carxiv_GOX%3E2410_16517%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true