Approximate Policy Iteration with a Policy Language Bias: Solving Relational Markov Decision Processes
We study an approach to policy selection for large relational Markov Decision Processes (MDPs). We consider a variant of approximate policy iteration (API) that replaces the usual value-function learning step with a learning step in policy space. This is advantageous in domains where good policies a...
Gespeichert in:
Veröffentlicht in: | The Journal of artificial intelligence research 2006-01, Vol.25, p.75-118 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 118 |
---|---|
container_issue | |
container_start_page | 75 |
container_title | The Journal of artificial intelligence research |
container_volume | 25 |
creator | Fern, A. Yoon, S. Givan, R. |
description | We study an approach to policy selection for large relational Markov Decision Processes (MDPs). We consider a variant of approximate policy iteration (API) that replaces the usual value-function learning step with a learning step in policy space. This is advantageous in domains where good policies are easier to represent and learn than the corresponding value functions, which is often the case for the relational MDPs we are interested in. In order to apply API to such problems, we introduce a relational policy language and corresponding learner. In addition, we introduce a new bootstrapping routine for goal-based planning domains, based on random walks. Such bootstrapping is necessary for many large relational MDPs, where reward is extremely sparse, as API is ineffective in such domains when initialized with an uninformed policy. Our experiments show that the resulting system is able to find good policies for a number of classical planning domains and their stochastic variants by solving them as extremely large relational MDPs. The experiments also point to some limitations of our approach, suggesting future work. |
doi_str_mv | 10.1613/jair.1700 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2554122364</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2554122364</sourcerecordid><originalsourceid>FETCH-LOGICAL-c257t-c528d0c682f852c4ff45d19c21e9b42b6d186f439a09f6ca096aed01817b2c293</originalsourceid><addsrcrecordid>eNpNkEtPwzAMgCMEEmNw4B9E4sShI06TtOE2xmvSEBOPc5Slycgoy0i6wf49LQ-Ji21Zny37Q-gYyAAE5GcL7eMACkJ2UA9IITJZ8GL3X72PDlJaEAKS0bKH3HC1iuHTv-nG4mmovdnicWOjbnxY4g_fvGD915_o5Xyt5xZfeJ3O8WOoN345xw-2_qZ1je90fA0bfGmNT938NAZjU7LpEO05XSd79Jv76Pn66ml0m03ub8aj4SQzlBdNZjgtK2JESV3JqWHOMV6BNBSsnDE6ExWUwrFcaiKdMG0U2lYESihm1FCZ99HJz972qfe1TY1ahHVsT0uKcs6A0lywljr9oUwMKUXr1Cq2BuJWAVGdRtVpVJ3G_AuFOmYa</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2554122364</pqid></control><display><type>article</type><title>Approximate Policy Iteration with a Policy Language Bias: Solving Relational Markov Decision Processes</title><source>DOAJ Directory of Open Access Journals</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><source>Free E- Journals</source><creator>Fern, A. ; Yoon, S. ; Givan, R.</creator><creatorcontrib>Fern, A. ; Yoon, S. ; Givan, R.</creatorcontrib><description>We study an approach to policy selection for large relational Markov Decision Processes (MDPs). We consider a variant of approximate policy iteration (API) that replaces the usual value-function learning step with a learning step in policy space. This is advantageous in domains where good policies are easier to represent and learn than the corresponding value functions, which is often the case for the relational MDPs we are interested in. In order to apply API to such problems, we introduce a relational policy language and corresponding learner. In addition, we introduce a new bootstrapping routine for goal-based planning domains, based on random walks. Such bootstrapping is necessary for many large relational MDPs, where reward is extremely sparse, as API is ineffective in such domains when initialized with an uninformed policy. Our experiments show that the resulting system is able to find good policies for a number of classical planning domains and their stochastic variants by solving them as extremely large relational MDPs. The experiments also point to some limitations of our approach, suggesting future work.</description><identifier>ISSN: 1076-9757</identifier><identifier>EISSN: 1076-9757</identifier><identifier>EISSN: 1943-5037</identifier><identifier>DOI: 10.1613/jair.1700</identifier><language>eng</language><publisher>San Francisco: AI Access Foundation</publisher><subject>Artificial intelligence ; Domains ; Learning ; Markov analysis ; Markov processes ; Policies ; Random walk</subject><ispartof>The Journal of artificial intelligence research, 2006-01, Vol.25, p.75-118</ispartof><rights>2006. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the associated terms available at https://www.jair.org/index.php/jair/about</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c257t-c528d0c682f852c4ff45d19c21e9b42b6d186f439a09f6ca096aed01817b2c293</citedby></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,864,27924,27925</link.rule.ids></links><search><creatorcontrib>Fern, A.</creatorcontrib><creatorcontrib>Yoon, S.</creatorcontrib><creatorcontrib>Givan, R.</creatorcontrib><title>Approximate Policy Iteration with a Policy Language Bias: Solving Relational Markov Decision Processes</title><title>The Journal of artificial intelligence research</title><description>We study an approach to policy selection for large relational Markov Decision Processes (MDPs). We consider a variant of approximate policy iteration (API) that replaces the usual value-function learning step with a learning step in policy space. This is advantageous in domains where good policies are easier to represent and learn than the corresponding value functions, which is often the case for the relational MDPs we are interested in. In order to apply API to such problems, we introduce a relational policy language and corresponding learner. In addition, we introduce a new bootstrapping routine for goal-based planning domains, based on random walks. Such bootstrapping is necessary for many large relational MDPs, where reward is extremely sparse, as API is ineffective in such domains when initialized with an uninformed policy. Our experiments show that the resulting system is able to find good policies for a number of classical planning domains and their stochastic variants by solving them as extremely large relational MDPs. The experiments also point to some limitations of our approach, suggesting future work.</description><subject>Artificial intelligence</subject><subject>Domains</subject><subject>Learning</subject><subject>Markov analysis</subject><subject>Markov processes</subject><subject>Policies</subject><subject>Random walk</subject><issn>1076-9757</issn><issn>1076-9757</issn><issn>1943-5037</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2006</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><recordid>eNpNkEtPwzAMgCMEEmNw4B9E4sShI06TtOE2xmvSEBOPc5Slycgoy0i6wf49LQ-Ji21Zny37Q-gYyAAE5GcL7eMACkJ2UA9IITJZ8GL3X72PDlJaEAKS0bKH3HC1iuHTv-nG4mmovdnicWOjbnxY4g_fvGD915_o5Xyt5xZfeJ3O8WOoN345xw-2_qZ1je90fA0bfGmNT938NAZjU7LpEO05XSd79Jv76Pn66ml0m03ub8aj4SQzlBdNZjgtK2JESV3JqWHOMV6BNBSsnDE6ExWUwrFcaiKdMG0U2lYESihm1FCZ99HJz972qfe1TY1ahHVsT0uKcs6A0lywljr9oUwMKUXr1Cq2BuJWAVGdRtVpVJ3G_AuFOmYa</recordid><startdate>20060101</startdate><enddate>20060101</enddate><creator>Fern, A.</creator><creator>Yoon, S.</creator><creator>Givan, R.</creator><general>AI Access Foundation</general><scope>AAYXX</scope><scope>CITATION</scope><scope>8FE</scope><scope>8FG</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>P62</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope></search><sort><creationdate>20060101</creationdate><title>Approximate Policy Iteration with a Policy Language Bias: Solving Relational Markov Decision Processes</title><author>Fern, A. ; Yoon, S. ; Givan, R.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c257t-c528d0c682f852c4ff45d19c21e9b42b6d186f439a09f6ca096aed01817b2c293</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2006</creationdate><topic>Artificial intelligence</topic><topic>Domains</topic><topic>Learning</topic><topic>Markov analysis</topic><topic>Markov processes</topic><topic>Policies</topic><topic>Random walk</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Fern, A.</creatorcontrib><creatorcontrib>Yoon, S.</creatorcontrib><creatorcontrib>Givan, R.</creatorcontrib><collection>CrossRef</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><jtitle>The Journal of artificial intelligence research</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Fern, A.</au><au>Yoon, S.</au><au>Givan, R.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Approximate Policy Iteration with a Policy Language Bias: Solving Relational Markov Decision Processes</atitle><jtitle>The Journal of artificial intelligence research</jtitle><date>2006-01-01</date><risdate>2006</risdate><volume>25</volume><spage>75</spage><epage>118</epage><pages>75-118</pages><issn>1076-9757</issn><eissn>1076-9757</eissn><eissn>1943-5037</eissn><abstract>We study an approach to policy selection for large relational Markov Decision Processes (MDPs). We consider a variant of approximate policy iteration (API) that replaces the usual value-function learning step with a learning step in policy space. This is advantageous in domains where good policies are easier to represent and learn than the corresponding value functions, which is often the case for the relational MDPs we are interested in. In order to apply API to such problems, we introduce a relational policy language and corresponding learner. In addition, we introduce a new bootstrapping routine for goal-based planning domains, based on random walks. Such bootstrapping is necessary for many large relational MDPs, where reward is extremely sparse, as API is ineffective in such domains when initialized with an uninformed policy. Our experiments show that the resulting system is able to find good policies for a number of classical planning domains and their stochastic variants by solving them as extremely large relational MDPs. The experiments also point to some limitations of our approach, suggesting future work.</abstract><cop>San Francisco</cop><pub>AI Access Foundation</pub><doi>10.1613/jair.1700</doi><tpages>44</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1076-9757 |
ispartof | The Journal of artificial intelligence research, 2006-01, Vol.25, p.75-118 |
issn | 1076-9757 1076-9757 1943-5037 |
language | eng |
recordid | cdi_proquest_journals_2554122364 |
source | DOAJ Directory of Open Access Journals; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals; Free E- Journals |
subjects | Artificial intelligence Domains Learning Markov analysis Markov processes Policies Random walk |
title | Approximate Policy Iteration with a Policy Language Bias: Solving Relational Markov Decision Processes |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-01T19%3A25%3A07IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Approximate%20Policy%20Iteration%20with%20a%20Policy%20Language%20Bias:%20Solving%20Relational%20Markov%20Decision%20Processes&rft.jtitle=The%20Journal%20of%20artificial%20intelligence%20research&rft.au=Fern,%20A.&rft.date=2006-01-01&rft.volume=25&rft.spage=75&rft.epage=118&rft.pages=75-118&rft.issn=1076-9757&rft.eissn=1076-9757&rft_id=info:doi/10.1613/jair.1700&rft_dat=%3Cproquest_cross%3E2554122364%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2554122364&rft_id=info:pmid/&rfr_iscdi=true |