Approximate Policy Iteration with a Policy Language Bias: Solving Relational Markov Decision Processes

We study an approach to policy selection for large relational Markov Decision Processes (MDPs). We consider a variant of approximate policy iteration (API) that replaces the usual value-function learning step with a learning step in policy space. This is advantageous in domains where good policies a...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:The Journal of artificial intelligence research 2006-01, Vol.25, p.75-118
Hauptverfasser: Fern, A., Yoon, S., Givan, R.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 118
container_issue
container_start_page 75
container_title The Journal of artificial intelligence research
container_volume 25
creator Fern, A.
Yoon, S.
Givan, R.
description We study an approach to policy selection for large relational Markov Decision Processes (MDPs). We consider a variant of approximate policy iteration (API) that replaces the usual value-function learning step with a learning step in policy space. This is advantageous in domains where good policies are easier to represent and learn than the corresponding value functions, which is often the case for the relational MDPs we are interested in. In order to apply API to such problems, we introduce a relational policy language and corresponding learner. In addition, we introduce a new bootstrapping routine for goal-based planning domains, based on random walks. Such bootstrapping is necessary for many large relational MDPs, where reward is extremely sparse, as API is ineffective in such domains when initialized with an uninformed policy. Our experiments show that the resulting system is able to find good policies for a number of classical planning domains and their stochastic variants by solving them as extremely large relational MDPs. The experiments also point to some limitations of our approach, suggesting future work.
doi_str_mv 10.1613/jair.1700
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2554122364</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2554122364</sourcerecordid><originalsourceid>FETCH-LOGICAL-c257t-c528d0c682f852c4ff45d19c21e9b42b6d186f439a09f6ca096aed01817b2c293</originalsourceid><addsrcrecordid>eNpNkEtPwzAMgCMEEmNw4B9E4sShI06TtOE2xmvSEBOPc5Slycgoy0i6wf49LQ-Ji21Zny37Q-gYyAAE5GcL7eMACkJ2UA9IITJZ8GL3X72PDlJaEAKS0bKH3HC1iuHTv-nG4mmovdnicWOjbnxY4g_fvGD915_o5Xyt5xZfeJ3O8WOoN345xw-2_qZ1je90fA0bfGmNT938NAZjU7LpEO05XSd79Jv76Pn66ml0m03ub8aj4SQzlBdNZjgtK2JESV3JqWHOMV6BNBSsnDE6ExWUwrFcaiKdMG0U2lYESihm1FCZ99HJz972qfe1TY1ahHVsT0uKcs6A0lywljr9oUwMKUXr1Cq2BuJWAVGdRtVpVJ3G_AuFOmYa</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2554122364</pqid></control><display><type>article</type><title>Approximate Policy Iteration with a Policy Language Bias: Solving Relational Markov Decision Processes</title><source>DOAJ Directory of Open Access Journals</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><source>Free E- Journals</source><creator>Fern, A. ; Yoon, S. ; Givan, R.</creator><creatorcontrib>Fern, A. ; Yoon, S. ; Givan, R.</creatorcontrib><description>We study an approach to policy selection for large relational Markov Decision Processes (MDPs). We consider a variant of approximate policy iteration (API) that replaces the usual value-function learning step with a learning step in policy space. This is advantageous in domains where good policies are easier to represent and learn than the corresponding value functions, which is often the case for the relational MDPs we are interested in. In order to apply API to such problems, we introduce a relational policy language and corresponding learner. In addition, we introduce a new bootstrapping routine for goal-based planning domains, based on random walks. Such bootstrapping is necessary for many large relational MDPs, where reward is extremely sparse, as API is ineffective in such domains when initialized with an uninformed policy. Our experiments show that the resulting system is able to find good policies for a number of classical planning domains and their stochastic variants by solving them as extremely large relational MDPs. The experiments also point to some limitations of our approach, suggesting future work.</description><identifier>ISSN: 1076-9757</identifier><identifier>EISSN: 1076-9757</identifier><identifier>EISSN: 1943-5037</identifier><identifier>DOI: 10.1613/jair.1700</identifier><language>eng</language><publisher>San Francisco: AI Access Foundation</publisher><subject>Artificial intelligence ; Domains ; Learning ; Markov analysis ; Markov processes ; Policies ; Random walk</subject><ispartof>The Journal of artificial intelligence research, 2006-01, Vol.25, p.75-118</ispartof><rights>2006. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the associated terms available at https://www.jair.org/index.php/jair/about</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c257t-c528d0c682f852c4ff45d19c21e9b42b6d186f439a09f6ca096aed01817b2c293</citedby></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,864,27924,27925</link.rule.ids></links><search><creatorcontrib>Fern, A.</creatorcontrib><creatorcontrib>Yoon, S.</creatorcontrib><creatorcontrib>Givan, R.</creatorcontrib><title>Approximate Policy Iteration with a Policy Language Bias: Solving Relational Markov Decision Processes</title><title>The Journal of artificial intelligence research</title><description>We study an approach to policy selection for large relational Markov Decision Processes (MDPs). We consider a variant of approximate policy iteration (API) that replaces the usual value-function learning step with a learning step in policy space. This is advantageous in domains where good policies are easier to represent and learn than the corresponding value functions, which is often the case for the relational MDPs we are interested in. In order to apply API to such problems, we introduce a relational policy language and corresponding learner. In addition, we introduce a new bootstrapping routine for goal-based planning domains, based on random walks. Such bootstrapping is necessary for many large relational MDPs, where reward is extremely sparse, as API is ineffective in such domains when initialized with an uninformed policy. Our experiments show that the resulting system is able to find good policies for a number of classical planning domains and their stochastic variants by solving them as extremely large relational MDPs. The experiments also point to some limitations of our approach, suggesting future work.</description><subject>Artificial intelligence</subject><subject>Domains</subject><subject>Learning</subject><subject>Markov analysis</subject><subject>Markov processes</subject><subject>Policies</subject><subject>Random walk</subject><issn>1076-9757</issn><issn>1076-9757</issn><issn>1943-5037</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2006</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><recordid>eNpNkEtPwzAMgCMEEmNw4B9E4sShI06TtOE2xmvSEBOPc5Slycgoy0i6wf49LQ-Ji21Zny37Q-gYyAAE5GcL7eMACkJ2UA9IITJZ8GL3X72PDlJaEAKS0bKH3HC1iuHTv-nG4mmovdnicWOjbnxY4g_fvGD915_o5Xyt5xZfeJ3O8WOoN345xw-2_qZ1je90fA0bfGmNT938NAZjU7LpEO05XSd79Jv76Pn66ml0m03ub8aj4SQzlBdNZjgtK2JESV3JqWHOMV6BNBSsnDE6ExWUwrFcaiKdMG0U2lYESihm1FCZ99HJz972qfe1TY1ahHVsT0uKcs6A0lywljr9oUwMKUXr1Cq2BuJWAVGdRtVpVJ3G_AuFOmYa</recordid><startdate>20060101</startdate><enddate>20060101</enddate><creator>Fern, A.</creator><creator>Yoon, S.</creator><creator>Givan, R.</creator><general>AI Access Foundation</general><scope>AAYXX</scope><scope>CITATION</scope><scope>8FE</scope><scope>8FG</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>P62</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope></search><sort><creationdate>20060101</creationdate><title>Approximate Policy Iteration with a Policy Language Bias: Solving Relational Markov Decision Processes</title><author>Fern, A. ; Yoon, S. ; Givan, R.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c257t-c528d0c682f852c4ff45d19c21e9b42b6d186f439a09f6ca096aed01817b2c293</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2006</creationdate><topic>Artificial intelligence</topic><topic>Domains</topic><topic>Learning</topic><topic>Markov analysis</topic><topic>Markov processes</topic><topic>Policies</topic><topic>Random walk</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Fern, A.</creatorcontrib><creatorcontrib>Yoon, S.</creatorcontrib><creatorcontrib>Givan, R.</creatorcontrib><collection>CrossRef</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><jtitle>The Journal of artificial intelligence research</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Fern, A.</au><au>Yoon, S.</au><au>Givan, R.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Approximate Policy Iteration with a Policy Language Bias: Solving Relational Markov Decision Processes</atitle><jtitle>The Journal of artificial intelligence research</jtitle><date>2006-01-01</date><risdate>2006</risdate><volume>25</volume><spage>75</spage><epage>118</epage><pages>75-118</pages><issn>1076-9757</issn><eissn>1076-9757</eissn><eissn>1943-5037</eissn><abstract>We study an approach to policy selection for large relational Markov Decision Processes (MDPs). We consider a variant of approximate policy iteration (API) that replaces the usual value-function learning step with a learning step in policy space. This is advantageous in domains where good policies are easier to represent and learn than the corresponding value functions, which is often the case for the relational MDPs we are interested in. In order to apply API to such problems, we introduce a relational policy language and corresponding learner. In addition, we introduce a new bootstrapping routine for goal-based planning domains, based on random walks. Such bootstrapping is necessary for many large relational MDPs, where reward is extremely sparse, as API is ineffective in such domains when initialized with an uninformed policy. Our experiments show that the resulting system is able to find good policies for a number of classical planning domains and their stochastic variants by solving them as extremely large relational MDPs. The experiments also point to some limitations of our approach, suggesting future work.</abstract><cop>San Francisco</cop><pub>AI Access Foundation</pub><doi>10.1613/jair.1700</doi><tpages>44</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1076-9757
ispartof The Journal of artificial intelligence research, 2006-01, Vol.25, p.75-118
issn 1076-9757
1076-9757
1943-5037
language eng
recordid cdi_proquest_journals_2554122364
source DOAJ Directory of Open Access Journals; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals; Free E- Journals
subjects Artificial intelligence
Domains
Learning
Markov analysis
Markov processes
Policies
Random walk
title Approximate Policy Iteration with a Policy Language Bias: Solving Relational Markov Decision Processes
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-01T19%3A25%3A07IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Approximate%20Policy%20Iteration%20with%20a%20Policy%20Language%20Bias:%20Solving%20Relational%20Markov%20Decision%20Processes&rft.jtitle=The%20Journal%20of%20artificial%20intelligence%20research&rft.au=Fern,%20A.&rft.date=2006-01-01&rft.volume=25&rft.spage=75&rft.epage=118&rft.pages=75-118&rft.issn=1076-9757&rft.eissn=1076-9757&rft_id=info:doi/10.1613/jair.1700&rft_dat=%3Cproquest_cross%3E2554122364%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2554122364&rft_id=info:pmid/&rfr_iscdi=true