RL-LLM-DT: An Automatic Decision Tree Generation Method Based on RL Evaluation and LLM Enhancement
Traditionally, AI development for two-player zero-sum games has relied on two primary techniques: decision trees and reinforcement learning (RL). A common approach involves using a fixed decision tree as one player's strategy while training an RL agent as the opponent to identify vulnerabilitie...
Gespeichert in:
Hauptverfasser: | , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Lin, Junjie Zhao, Jian Liu, Lin Deng, Yue Zhao, Youpeng Huang, Lanxiao Lin, Xia Zhou, Wengang Li, Houqiang |
description | Traditionally, AI development for two-player zero-sum games has relied on two
primary techniques: decision trees and reinforcement learning (RL). A common
approach involves using a fixed decision tree as one player's strategy while
training an RL agent as the opponent to identify vulnerabilities in the
decision tree, thereby improving its strategic strength iteratively. However,
this process often requires significant human intervention to refine the
decision tree after identifying its weaknesses, resulting in inefficiencies and
hindering full automation of the strategy enhancement process. Fortunately, the
advent of Large Language Models (LLMs) offers a transformative opportunity to
automate the process. We propose RL-LLM-DT, an automatic decision tree
generation method based on RL Evaluation and LLM Enhancement. Given an initial
decision tree, the method involves two important iterative steps. Response
Policy Search: RL is used to discover counter-strategies targeting the decision
tree. Policy Improvement: LLMs analyze failure scenarios and generate improved
decision tree code. In our method, RL focuses on finding the decision tree's
flaws while LLM is prompted to generate an improved version of the decision
tree. The iterative refinement process terminates when RL can't find any flaw
of the tree or LLM fails to improve the tree. To evaluate the effectiveness of
this integrated approach, we conducted experiments in a curling game. After
iterative refinements, our curling AI based on the decision tree ranks first on
the Jidi platform among 34 curling AIs in total, which demonstrates that LLMs
can significantly enhance the robustness and adaptability of decision trees,
representing a substantial advancement in the field of Game AI. Our code is
available at https://github.com/Linjunjie99/RL-LLM-DT. |
doi_str_mv | 10.48550/arxiv.2412.11417 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2412_11417</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2412_11417</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2412_114173</originalsourceid><addsrcrecordid>eNqFjr0OgjAUhbs4GPUBnLwvAFKEaNxQUIe6EHZyhWtoAsWUQvTtrejudHJ-cvIxtuSeG-zC0FujfsrB9QPuu5wHfDtlt1Q4QlydONtDpCDqTdugkQXEVMhOtgoyTQRnUqRtbv2VTNWWcMCOSrA-FZAMWPffFlUJ9g8SVaEqqCFl5mxyx7qjxU9nbHVKsuPFGWnyh5YN6lf-ocpHqs3_xRtwxUCb</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>RL-LLM-DT: An Automatic Decision Tree Generation Method Based on RL Evaluation and LLM Enhancement</title><source>arXiv.org</source><creator>Lin, Junjie ; Zhao, Jian ; Liu, Lin ; Deng, Yue ; Zhao, Youpeng ; Huang, Lanxiao ; Lin, Xia ; Zhou, Wengang ; Li, Houqiang</creator><creatorcontrib>Lin, Junjie ; Zhao, Jian ; Liu, Lin ; Deng, Yue ; Zhao, Youpeng ; Huang, Lanxiao ; Lin, Xia ; Zhou, Wengang ; Li, Houqiang</creatorcontrib><description>Traditionally, AI development for two-player zero-sum games has relied on two
primary techniques: decision trees and reinforcement learning (RL). A common
approach involves using a fixed decision tree as one player's strategy while
training an RL agent as the opponent to identify vulnerabilities in the
decision tree, thereby improving its strategic strength iteratively. However,
this process often requires significant human intervention to refine the
decision tree after identifying its weaknesses, resulting in inefficiencies and
hindering full automation of the strategy enhancement process. Fortunately, the
advent of Large Language Models (LLMs) offers a transformative opportunity to
automate the process. We propose RL-LLM-DT, an automatic decision tree
generation method based on RL Evaluation and LLM Enhancement. Given an initial
decision tree, the method involves two important iterative steps. Response
Policy Search: RL is used to discover counter-strategies targeting the decision
tree. Policy Improvement: LLMs analyze failure scenarios and generate improved
decision tree code. In our method, RL focuses on finding the decision tree's
flaws while LLM is prompted to generate an improved version of the decision
tree. The iterative refinement process terminates when RL can't find any flaw
of the tree or LLM fails to improve the tree. To evaluate the effectiveness of
this integrated approach, we conducted experiments in a curling game. After
iterative refinements, our curling AI based on the decision tree ranks first on
the Jidi platform among 34 curling AIs in total, which demonstrates that LLMs
can significantly enhance the robustness and adaptability of decision trees,
representing a substantial advancement in the field of Game AI. Our code is
available at https://github.com/Linjunjie99/RL-LLM-DT.</description><identifier>DOI: 10.48550/arxiv.2412.11417</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Learning</subject><creationdate>2024-12</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2412.11417$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2412.11417$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Lin, Junjie</creatorcontrib><creatorcontrib>Zhao, Jian</creatorcontrib><creatorcontrib>Liu, Lin</creatorcontrib><creatorcontrib>Deng, Yue</creatorcontrib><creatorcontrib>Zhao, Youpeng</creatorcontrib><creatorcontrib>Huang, Lanxiao</creatorcontrib><creatorcontrib>Lin, Xia</creatorcontrib><creatorcontrib>Zhou, Wengang</creatorcontrib><creatorcontrib>Li, Houqiang</creatorcontrib><title>RL-LLM-DT: An Automatic Decision Tree Generation Method Based on RL Evaluation and LLM Enhancement</title><description>Traditionally, AI development for two-player zero-sum games has relied on two
primary techniques: decision trees and reinforcement learning (RL). A common
approach involves using a fixed decision tree as one player's strategy while
training an RL agent as the opponent to identify vulnerabilities in the
decision tree, thereby improving its strategic strength iteratively. However,
this process often requires significant human intervention to refine the
decision tree after identifying its weaknesses, resulting in inefficiencies and
hindering full automation of the strategy enhancement process. Fortunately, the
advent of Large Language Models (LLMs) offers a transformative opportunity to
automate the process. We propose RL-LLM-DT, an automatic decision tree
generation method based on RL Evaluation and LLM Enhancement. Given an initial
decision tree, the method involves two important iterative steps. Response
Policy Search: RL is used to discover counter-strategies targeting the decision
tree. Policy Improvement: LLMs analyze failure scenarios and generate improved
decision tree code. In our method, RL focuses on finding the decision tree's
flaws while LLM is prompted to generate an improved version of the decision
tree. The iterative refinement process terminates when RL can't find any flaw
of the tree or LLM fails to improve the tree. To evaluate the effectiveness of
this integrated approach, we conducted experiments in a curling game. After
iterative refinements, our curling AI based on the decision tree ranks first on
the Jidi platform among 34 curling AIs in total, which demonstrates that LLMs
can significantly enhance the robustness and adaptability of decision trees,
representing a substantial advancement in the field of Game AI. Our code is
available at https://github.com/Linjunjie99/RL-LLM-DT.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNqFjr0OgjAUhbs4GPUBnLwvAFKEaNxQUIe6EHZyhWtoAsWUQvTtrejudHJ-cvIxtuSeG-zC0FujfsrB9QPuu5wHfDtlt1Q4QlydONtDpCDqTdugkQXEVMhOtgoyTQRnUqRtbv2VTNWWcMCOSrA-FZAMWPffFlUJ9g8SVaEqqCFl5mxyx7qjxU9nbHVKsuPFGWnyh5YN6lf-ocpHqs3_xRtwxUCb</recordid><startdate>20241215</startdate><enddate>20241215</enddate><creator>Lin, Junjie</creator><creator>Zhao, Jian</creator><creator>Liu, Lin</creator><creator>Deng, Yue</creator><creator>Zhao, Youpeng</creator><creator>Huang, Lanxiao</creator><creator>Lin, Xia</creator><creator>Zhou, Wengang</creator><creator>Li, Houqiang</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20241215</creationdate><title>RL-LLM-DT: An Automatic Decision Tree Generation Method Based on RL Evaluation and LLM Enhancement</title><author>Lin, Junjie ; Zhao, Jian ; Liu, Lin ; Deng, Yue ; Zhao, Youpeng ; Huang, Lanxiao ; Lin, Xia ; Zhou, Wengang ; Li, Houqiang</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2412_114173</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Lin, Junjie</creatorcontrib><creatorcontrib>Zhao, Jian</creatorcontrib><creatorcontrib>Liu, Lin</creatorcontrib><creatorcontrib>Deng, Yue</creatorcontrib><creatorcontrib>Zhao, Youpeng</creatorcontrib><creatorcontrib>Huang, Lanxiao</creatorcontrib><creatorcontrib>Lin, Xia</creatorcontrib><creatorcontrib>Zhou, Wengang</creatorcontrib><creatorcontrib>Li, Houqiang</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Lin, Junjie</au><au>Zhao, Jian</au><au>Liu, Lin</au><au>Deng, Yue</au><au>Zhao, Youpeng</au><au>Huang, Lanxiao</au><au>Lin, Xia</au><au>Zhou, Wengang</au><au>Li, Houqiang</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>RL-LLM-DT: An Automatic Decision Tree Generation Method Based on RL Evaluation and LLM Enhancement</atitle><date>2024-12-15</date><risdate>2024</risdate><abstract>Traditionally, AI development for two-player zero-sum games has relied on two
primary techniques: decision trees and reinforcement learning (RL). A common
approach involves using a fixed decision tree as one player's strategy while
training an RL agent as the opponent to identify vulnerabilities in the
decision tree, thereby improving its strategic strength iteratively. However,
this process often requires significant human intervention to refine the
decision tree after identifying its weaknesses, resulting in inefficiencies and
hindering full automation of the strategy enhancement process. Fortunately, the
advent of Large Language Models (LLMs) offers a transformative opportunity to
automate the process. We propose RL-LLM-DT, an automatic decision tree
generation method based on RL Evaluation and LLM Enhancement. Given an initial
decision tree, the method involves two important iterative steps. Response
Policy Search: RL is used to discover counter-strategies targeting the decision
tree. Policy Improvement: LLMs analyze failure scenarios and generate improved
decision tree code. In our method, RL focuses on finding the decision tree's
flaws while LLM is prompted to generate an improved version of the decision
tree. The iterative refinement process terminates when RL can't find any flaw
of the tree or LLM fails to improve the tree. To evaluate the effectiveness of
this integrated approach, we conducted experiments in a curling game. After
iterative refinements, our curling AI based on the decision tree ranks first on
the Jidi platform among 34 curling AIs in total, which demonstrates that LLMs
can significantly enhance the robustness and adaptability of decision trees,
representing a substantial advancement in the field of Game AI. Our code is
available at https://github.com/Linjunjie99/RL-LLM-DT.</abstract><doi>10.48550/arxiv.2412.11417</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.2412.11417 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_2412_11417 |
source | arXiv.org |
subjects | Computer Science - Artificial Intelligence Computer Science - Learning |
title | RL-LLM-DT: An Automatic Decision Tree Generation Method Based on RL Evaluation and LLM Enhancement |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-01T07%3A43%3A31IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=RL-LLM-DT:%20An%20Automatic%20Decision%20Tree%20Generation%20Method%20Based%20on%20RL%20Evaluation%20and%20LLM%20Enhancement&rft.au=Lin,%20Junjie&rft.date=2024-12-15&rft_id=info:doi/10.48550/arxiv.2412.11417&rft_dat=%3Carxiv_GOX%3E2412_11417%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |