Approximate Policy Iteration with a Policy Language Bias: Solving Relational Markov Decision Processes

We study an approach to policy selection for large relational Markov Decision Processes (MDPs). We consider a variant of approximate policy iteration (API) that replaces the usual value-function learning step with a learning step in policy space. This is advantageous in domains where good policies a...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	The Journal of artificial intelligence research 2006-01, Vol.25, p.75-118
Hauptverfasser:	Fern, A., Yoon, S., Givan, R.
Format:	Artikel
Sprache:	eng
Schlagworte:	Artificial intelligence Domains Learning Markov analysis Markov processes Policies Random walk
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	118
container_issue
container_start_page	75
container_title	The Journal of artificial intelligence research
container_volume	25
creator	Fern, A. Yoon, S. Givan, R.
description	We study an approach to policy selection for large relational Markov Decision Processes (MDPs). We consider a variant of approximate policy iteration (API) that replaces the usual value-function learning step with a learning step in policy space. This is advantageous in domains where good policies are easier to represent and learn than the corresponding value functions, which is often the case for the relational MDPs we are interested in. In order to apply API to such problems, we introduce a relational policy language and corresponding learner. In addition, we introduce a new bootstrapping routine for goal-based planning domains, based on random walks. Such bootstrapping is necessary for many large relational MDPs, where reward is extremely sparse, as API is ineffective in such domains when initialized with an uninformed policy. Our experiments show that the resulting system is able to find good policies for a number of classical planning domains and their stochastic variants by solving them as extremely large relational MDPs. The experiments also point to some limitations of our approach, suggesting future work.
doi_str_mv	10.1613/jair.1700
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2554122364</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2554122364</sourcerecordid><originalsourceid>FETCH-LOGICAL-c257t-c528d0c682f852c4ff45d19c21e9b42b6d186f439a09f6ca096aed01817b2c293</originalsourceid><addsrcrecordid>eNpNkEtPwzAMgCMEEmNw4B9E4sShI06TtOE2xmvSEBOPc5Slycgoy0i6wf49LQ-Ji21Zny37Q-gYyAAE5GcL7eMACkJ2UA9IITJZ8GL3X72PDlJaEAKS0bKH3HC1iuHTv-nG4mmovdnicWOjbnxY4g_fvGD915_o5Xyt5xZfeJ3O8WOoN345xw-2_qZ1je90fA0bfGmNT938NAZjU7LpEO05XSd79Jv76Pn66ml0m03ub8aj4SQzlBdNZjgtK2JESV3JqWHOMV6BNBSsnDE6ExWUwrFcaiKdMG0U2lYESihm1FCZ99HJz972qfe1TY1ahHVsT0uKcs6A0lywljr9oUwMKUXr1Cq2BuJWAVGdRtVpVJ3G_AuFOmYa</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2554122364</pqid></control><display><type>article</type><title>Approximate Policy Iteration with a Policy Language Bias: Solving Relational Markov Decision Processes</title><source>DOAJ Directory of Open Access Journals</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><source>Free E- Journals</source><creator>Fern, A. ; Yoon, S. ; Givan, R.</creator><creatorcontrib>Fern, A. ; Yoon, S. ; Givan, R.</creatorcontrib><description>We study an approach to policy selection for large relational Markov Decision Processes (MDPs). We consider a variant of approximate policy iteration (API) that replaces the usual value-function learning step with a learning step in policy space. This is advantageous in domains where good policies are easier to represent and learn than the corresponding value functions, which is often the case for the relational MDPs we are interested in. In order to apply API to such problems, we introduce a relational policy language and corresponding learner. In addition, we introduce a new bootstrapping routine for goal-based planning domains, based on random walks. Such bootstrapping is necessary for many large relational MDPs, where reward is extremely sparse, as API is ineffective in such domains when initialized with an uninformed policy. Our experiments show that the resulting system is able to find good policies for a number of classical planning domains and their stochastic variants by solving them as extremely large relational MDPs. The experiments also point to some limitations of our approach, suggesting future work.</description><identifier>ISSN: 1076-9757</identifier><identifier>EISSN: 1076-9757</identifier><identifier>EISSN: 1943-5037</identifier><identifier>DOI: 10.1613/jair.1700</identifier><language>eng</language><publisher>San Francisco: AI Access Foundation</publisher><subject>Artificial intelligence ; Domains ; Learning ; Markov analysis ; Markov processes ; Policies ; Random walk</subject><ispartof>The Journal of artificial intelligence research, 2006-01, Vol.25, p.75-118</ispartof><rights>2006. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the associated terms available at https://www.jair.org/index.php/jair/about</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c257t-c528d0c682f852c4ff45d19c21e9b42b6d186f439a09f6ca096aed01817b2c293</citedby></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,864,27924,27925</link.rule.ids></links><search><creatorcontrib>Fern, A.</creatorcontrib><creatorcontrib>Yoon, S.</creatorcontrib><creatorcontrib>Givan, R.</creatorcontrib><title>Approximate Policy Iteration with a Policy Language Bias: Solving Relational Markov Decision Processes</title><title>The Journal of artificial intelligence research</title><description>We study an approach to policy selection for large relational Markov Decision Processes (MDPs). We consider a variant of approximate policy iteration (API) that replaces the usual value-function learning step with a learning step in policy space. This is advantageous in domains where good policies are easier to represent and learn than the corresponding value functions, which is often the case for the relational MDPs we are interested in. In order to apply API to such problems, we introduce a relational policy language and corresponding learner. In addition, we introduce a new bootstrapping routine for goal-based planning domains, based on random walks. Such bootstrapping is necessary for many large relational MDPs, where reward is extremely sparse, as API is ineffective in such domains when initialized with an uninformed policy. Our experiments show that the resulting system is able to find good policies for a number of classical planning domains and their stochastic variants by solving them as extremely large relational MDPs. The experiments also point to some limitations of our approach, suggesting future work.</description><subject>Artificial intelligence</subject><subject>Domains</subject><subject>Learning</subject><subject>Markov analysis</subject><subject>Markov processes</subject><subject>Policies</subject><subject>Random walk</subject><issn>1076-9757</issn><issn>1076-9757</issn><issn>1943-5037</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2006</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><recordid>eNpNkEtPwzAMgCMEEmNw4B9E4sShI06TtOE2xmvSEBOPc5Slycgoy0i6wf49LQ-Ji21Zny37Q-gYyAAE5GcL7eMACkJ2UA9IITJZ8GL3X72PDlJaEAKS0bKH3HC1iuHTv-nG4mmovdnicWOjbnxY4g_fvGD915_o5Xyt5xZfeJ3O8WOoN345xw-2_qZ1je90fA0bfGmNT938NAZjU7LpEO05XSd79Jv76Pn66ml0m03ub8aj4SQzlBdNZjgtK2JESV3JqWHOMV6BNBSsnDE6ExWUwrFcaiKdMG0U2lYESihm1FCZ99HJz972qfe1TY1ahHVsT0uKcs6A0lywljr9oUwMKUXr1Cq2BuJWAVGdRtVpVJ3G_AuFOmYa</recordid><startdate>20060101</startdate><enddate>20060101</enddate><creator>Fern, A.</creator><creator>Yoon, S.</creator><creator>Givan, R.</creator><general>AI Access Foundation</general><scope>AAYXX</scope><scope>CITATION</scope><scope>8FE</scope><scope>8FG</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>P62</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope></search><sort><creationdate>20060101</creationdate><title>Approximate Policy Iteration with a Policy Language Bias: Solving Relational Markov Decision Processes</title><author>Fern, A. ; Yoon, S. ; Givan, R.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c257t-c528d0c682f852c4ff45d19c21e9b42b6d186f439a09f6ca096aed01817b2c293</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2006</creationdate><topic>Artificial intelligence</topic><topic>Domains</topic><topic>Learning</topic><topic>Markov analysis</topic><topic>Markov processes</topic><topic>Policies</topic><topic>Random walk</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Fern, A.</creatorcontrib><creatorcontrib>Yoon, S.</creatorcontrib><creatorcontrib>Givan, R.</creatorcontrib><collection>CrossRef</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><jtitle>The Journal of artificial intelligence research</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Fern, A.</au><au>Yoon, S.</au><au>Givan, R.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Approximate Policy Iteration with a Policy Language Bias: Solving Relational Markov Decision Processes</atitle><jtitle>The Journal of artificial intelligence research</jtitle><date>2006-01-01</date><risdate>2006</risdate><volume>25</volume><spage>75</spage><epage>118</epage><pages>75-118</pages><issn>1076-9757</issn><eissn>1076-9757</eissn><eissn>1943-5037</eissn><abstract>We study an approach to policy selection for large relational Markov Decision Processes (MDPs). We consider a variant of approximate policy iteration (API) that replaces the usual value-function learning step with a learning step in policy space. This is advantageous in domains where good policies are easier to represent and learn than the corresponding value functions, which is often the case for the relational MDPs we are interested in. In order to apply API to such problems, we introduce a relational policy language and corresponding learner. In addition, we introduce a new bootstrapping routine for goal-based planning domains, based on random walks. Such bootstrapping is necessary for many large relational MDPs, where reward is extremely sparse, as API is ineffective in such domains when initialized with an uninformed policy. Our experiments show that the resulting system is able to find good policies for a number of classical planning domains and their stochastic variants by solving them as extremely large relational MDPs. The experiments also point to some limitations of our approach, suggesting future work.</abstract><cop>San Francisco</cop><pub>AI Access Foundation</pub><doi>10.1613/jair.1700</doi><tpages>44</tpages><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 1076-9757
ispartof	The Journal of artificial intelligence research, 2006-01, Vol.25, p.75-118
issn	1076-9757 1076-9757 1943-5037
language	eng
recordid	cdi_proquest_journals_2554122364
source	DOAJ Directory of Open Access Journals; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals; Free E- Journals
subjects	Artificial intelligence Domains Learning Markov analysis Markov processes Policies Random walk
title	Approximate Policy Iteration with a Policy Language Bias: Solving Relational Markov Decision Processes
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-01T19%3A25%3A07IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Approximate%20Policy%20Iteration%20with%20a%20Policy%20Language%20Bias:%20Solving%20Relational%20Markov%20Decision%20Processes&rft.jtitle=The%20Journal%20of%20artificial%20intelligence%20research&rft.au=Fern,%20A.&rft.date=2006-01-01&rft.volume=25&rft.spage=75&rft.epage=118&rft.pages=75-118&rft.issn=1076-9757&rft.eissn=1076-9757&rft_id=info:doi/10.1613/jair.1700&rft_dat=%3Cproquest_cross%3E2554122364%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2554122364&rft_id=info:pmid/&rfr_iscdi=true