Performance Loss Bounds for Approximate Value Iteration with State Aggregation

We consider approximate value iteration with a parameterized approximator in which the state space is partitioned and the optimal cost-to-go function over each partition is approximated by a constant. We establish performance loss bounds for policies derived from approximations associated with fixed...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Mathematics of operations research 2006-05, Vol.31 (2), p.234-244
1. Verfasser: Van Roy, Benjamin
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 244
container_issue 2
container_start_page 234
container_title Mathematics of operations research
container_volume 31
creator Van Roy, Benjamin
description We consider approximate value iteration with a parameterized approximator in which the state space is partitioned and the optimal cost-to-go function over each partition is approximated by a constant. We establish performance loss bounds for policies derived from approximations associated with fixed points. These bounds identify benefits to using invariant distributions of appropriate policies as projection weights. Such projection weighting relates to what is done by temporal-difference learning. Our analysis also leads to the first performance loss bound for approximate value iteration with an average-cost objective.
doi_str_mv 10.1287/moor.1060.0188
format Article
fullrecord <record><control><sourceid>gale_jstor</sourceid><recordid>TN_cdi_jstor_primary_25151721</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A147216120</galeid><jstor_id>25151721</jstor_id><sourcerecordid>A147216120</sourcerecordid><originalsourceid>FETCH-LOGICAL-c469t-b76905ae9cab60d072a89b6d13e4206747d7d970cb0707e5ed6a64b6f6b0a2973</originalsourceid><addsrcrecordid>eNqFkdGL1DAQxoN44Hrnq29C8UV86N5Mmibbx_XQu4XlFE_Ft5C2026XbbOXpNz535taRYQFyUNg5vd9mcnH2EuEJfKVuuytdUsECUvA1eoJW2DOZZoLhU_ZAjIpUiXz78_Yc-_3AJgrFAt2-4lcY11vhoqSrfU-eWfHofZJLCbr49HZx643gZJv5jBSsgnkTOjskDx0YZfcham1bltH7a_yBTtrzMHTi9_3Ofv64f2Xq5t0-_F6c7XeppWQRUhLJQvIDRWVKSXUoLhZFaWsMSPBQSqhalUXCqoSFCjKqZZGilI2sgTDC5Wds9ezbxzwfiQf9N6ObohPao5cqqIQMkLpDLXmQLobGhucqVoa4g4HO1DTxfIaheIokUPklyf4eGrqu-qk4O0_gsgEegytGb3Xm7vPJ80rF3_ZUaOPLv6s-6ER9JSfnvLTU356yi8KXs2CvQ-x8YfmOeYYTf9uNw3qev9_vzczv-va3UPn5g0nYcx3F9EMNdc8E9lPqZGy2A</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>212679946</pqid></control><display><type>article</type><title>Performance Loss Bounds for Approximate Value Iteration with State Aggregation</title><source>Informs</source><source>JSTOR Mathematics &amp; Statistics</source><source>EBSCOhost Business Source Complete</source><source>JSTOR Archive Collection A-Z Listing</source><creator>Van Roy, Benjamin</creator><creatorcontrib>Van Roy, Benjamin</creatorcontrib><description>We consider approximate value iteration with a parameterized approximator in which the state space is partitioned and the optimal cost-to-go function over each partition is approximated by a constant. We establish performance loss bounds for policies derived from approximations associated with fixed points. These bounds identify benefits to using invariant distributions of appropriate policies as projection weights. Such projection weighting relates to what is done by temporal-difference learning. Our analysis also leads to the first performance loss bound for approximate value iteration with an average-cost objective.</description><identifier>ISSN: 0364-765X</identifier><identifier>EISSN: 1526-5471</identifier><identifier>DOI: 10.1287/moor.1060.0188</identifier><identifier>CODEN: MOREDQ</identifier><language>eng</language><publisher>Linthicum: INFORMS</publisher><subject>Aggregation ; Analysis ; approximate value iteration ; Approximate values ; Approximation ; Approximations ; Difference equations ; Dynamic programming ; Iterative solutions ; Machine learning ; Markov processes ; Mathematical aptitude ; Mathematical functions ; Mathematical theorems ; Operations research ; Optimization algorithms ; state aggregation ; Studies ; temporal-difference learning</subject><ispartof>Mathematics of operations research, 2006-05, Vol.31 (2), p.234-244</ispartof><rights>Copyright 2006 Institute for Operations Research and the Management Sciences</rights><rights>COPYRIGHT 2006 Institute for Operations Research and the Management Sciences</rights><rights>Copyright Institute for Operations Research and the Management Sciences May 2006</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c469t-b76905ae9cab60d072a89b6d13e4206747d7d970cb0707e5ed6a64b6f6b0a2973</citedby><cites>FETCH-LOGICAL-c469t-b76905ae9cab60d072a89b6d13e4206747d7d970cb0707e5ed6a64b6f6b0a2973</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.jstor.org/stable/pdf/25151721$$EPDF$$P50$$Gjstor$$H</linktopdf><linktohtml>$$Uhttps://pubsonline.informs.org/doi/full/10.1287/moor.1060.0188$$EHTML$$P50$$Ginforms$$H</linktohtml><link.rule.ids>314,780,784,803,832,3692,27924,27925,58017,58021,58250,58254,62616</link.rule.ids></links><search><creatorcontrib>Van Roy, Benjamin</creatorcontrib><title>Performance Loss Bounds for Approximate Value Iteration with State Aggregation</title><title>Mathematics of operations research</title><description>We consider approximate value iteration with a parameterized approximator in which the state space is partitioned and the optimal cost-to-go function over each partition is approximated by a constant. We establish performance loss bounds for policies derived from approximations associated with fixed points. These bounds identify benefits to using invariant distributions of appropriate policies as projection weights. Such projection weighting relates to what is done by temporal-difference learning. Our analysis also leads to the first performance loss bound for approximate value iteration with an average-cost objective.</description><subject>Aggregation</subject><subject>Analysis</subject><subject>approximate value iteration</subject><subject>Approximate values</subject><subject>Approximation</subject><subject>Approximations</subject><subject>Difference equations</subject><subject>Dynamic programming</subject><subject>Iterative solutions</subject><subject>Machine learning</subject><subject>Markov processes</subject><subject>Mathematical aptitude</subject><subject>Mathematical functions</subject><subject>Mathematical theorems</subject><subject>Operations research</subject><subject>Optimization algorithms</subject><subject>state aggregation</subject><subject>Studies</subject><subject>temporal-difference learning</subject><issn>0364-765X</issn><issn>1526-5471</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2006</creationdate><recordtype>article</recordtype><sourceid>8G5</sourceid><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><sourceid>GUQSH</sourceid><sourceid>M2O</sourceid><recordid>eNqFkdGL1DAQxoN44Hrnq29C8UV86N5Mmibbx_XQu4XlFE_Ft5C2026XbbOXpNz535taRYQFyUNg5vd9mcnH2EuEJfKVuuytdUsECUvA1eoJW2DOZZoLhU_ZAjIpUiXz78_Yc-_3AJgrFAt2-4lcY11vhoqSrfU-eWfHofZJLCbr49HZx643gZJv5jBSsgnkTOjskDx0YZfcham1bltH7a_yBTtrzMHTi9_3Ofv64f2Xq5t0-_F6c7XeppWQRUhLJQvIDRWVKSXUoLhZFaWsMSPBQSqhalUXCqoSFCjKqZZGilI2sgTDC5Wds9ezbxzwfiQf9N6ObohPao5cqqIQMkLpDLXmQLobGhucqVoa4g4HO1DTxfIaheIokUPklyf4eGrqu-qk4O0_gsgEegytGb3Xm7vPJ80rF3_ZUaOPLv6s-6ER9JSfnvLTU356yi8KXs2CvQ-x8YfmOeYYTf9uNw3qev9_vzczv-va3UPn5g0nYcx3F9EMNdc8E9lPqZGy2A</recordid><startdate>20060501</startdate><enddate>20060501</enddate><creator>Van Roy, Benjamin</creator><general>INFORMS</general><general>Institute for Operations Research and the Management Sciences</general><scope>AAYXX</scope><scope>CITATION</scope><scope>ISR</scope><scope>3V.</scope><scope>7WY</scope><scope>7WZ</scope><scope>7XB</scope><scope>87Z</scope><scope>8AL</scope><scope>8AO</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>8FL</scope><scope>8G5</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BEZIV</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FRNLG</scope><scope>F~G</scope><scope>GNUQQ</scope><scope>GUQSH</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K60</scope><scope>K6~</scope><scope>K7-</scope><scope>L.-</scope><scope>L6V</scope><scope>M0C</scope><scope>M0N</scope><scope>M2O</scope><scope>M7S</scope><scope>MBDVC</scope><scope>P5Z</scope><scope>P62</scope><scope>PADUT</scope><scope>PQBIZ</scope><scope>PQBZA</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope><scope>PYYUZ</scope><scope>Q9U</scope></search><sort><creationdate>20060501</creationdate><title>Performance Loss Bounds for Approximate Value Iteration with State Aggregation</title><author>Van Roy, Benjamin</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c469t-b76905ae9cab60d072a89b6d13e4206747d7d970cb0707e5ed6a64b6f6b0a2973</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2006</creationdate><topic>Aggregation</topic><topic>Analysis</topic><topic>approximate value iteration</topic><topic>Approximate values</topic><topic>Approximation</topic><topic>Approximations</topic><topic>Difference equations</topic><topic>Dynamic programming</topic><topic>Iterative solutions</topic><topic>Machine learning</topic><topic>Markov processes</topic><topic>Mathematical aptitude</topic><topic>Mathematical functions</topic><topic>Mathematical theorems</topic><topic>Operations research</topic><topic>Optimization algorithms</topic><topic>state aggregation</topic><topic>Studies</topic><topic>temporal-difference learning</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Van Roy, Benjamin</creatorcontrib><collection>CrossRef</collection><collection>Gale In Context: Science</collection><collection>ProQuest Central (Corporate)</collection><collection>Access via ABI/INFORM (ProQuest)</collection><collection>ABI/INFORM Global (PDF only)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>ABI/INFORM Global (Alumni Edition)</collection><collection>Computing Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ABI/INFORM Collection (Alumni Edition)</collection><collection>Research Library (Alumni Edition)</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Business Premium Collection</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Business Premium Collection (Alumni)</collection><collection>ABI/INFORM Global (Corporate)</collection><collection>ProQuest Central Student</collection><collection>Research Library Prep</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Business Collection (Alumni Edition)</collection><collection>ProQuest Business Collection</collection><collection>Computer Science Database</collection><collection>ABI/INFORM Professional Advanced</collection><collection>ProQuest Engineering Collection</collection><collection>ABI/INFORM Global</collection><collection>Computing Database</collection><collection>Research Library</collection><collection>Engineering Database</collection><collection>Research Library (Corporate)</collection><collection>Advanced Technologies &amp; Aerospace Database</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>Research Library China</collection><collection>ProQuest One Business</collection><collection>ProQuest One Business (Alumni)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection><collection>ABI/INFORM Collection China</collection><collection>ProQuest Central Basic</collection><jtitle>Mathematics of operations research</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Van Roy, Benjamin</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Performance Loss Bounds for Approximate Value Iteration with State Aggregation</atitle><jtitle>Mathematics of operations research</jtitle><date>2006-05-01</date><risdate>2006</risdate><volume>31</volume><issue>2</issue><spage>234</spage><epage>244</epage><pages>234-244</pages><issn>0364-765X</issn><eissn>1526-5471</eissn><coden>MOREDQ</coden><abstract>We consider approximate value iteration with a parameterized approximator in which the state space is partitioned and the optimal cost-to-go function over each partition is approximated by a constant. We establish performance loss bounds for policies derived from approximations associated with fixed points. These bounds identify benefits to using invariant distributions of appropriate policies as projection weights. Such projection weighting relates to what is done by temporal-difference learning. Our analysis also leads to the first performance loss bound for approximate value iteration with an average-cost objective.</abstract><cop>Linthicum</cop><pub>INFORMS</pub><doi>10.1287/moor.1060.0188</doi><tpages>11</tpages></addata></record>
fulltext fulltext
identifier ISSN: 0364-765X
ispartof Mathematics of operations research, 2006-05, Vol.31 (2), p.234-244
issn 0364-765X
1526-5471
language eng
recordid cdi_jstor_primary_25151721
source Informs; JSTOR Mathematics & Statistics; EBSCOhost Business Source Complete; JSTOR Archive Collection A-Z Listing
subjects Aggregation
Analysis
approximate value iteration
Approximate values
Approximation
Approximations
Difference equations
Dynamic programming
Iterative solutions
Machine learning
Markov processes
Mathematical aptitude
Mathematical functions
Mathematical theorems
Operations research
Optimization algorithms
state aggregation
Studies
temporal-difference learning
title Performance Loss Bounds for Approximate Value Iteration with State Aggregation
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-25T01%3A38%3A59IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_jstor&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Performance%20Loss%20Bounds%20for%20Approximate%20Value%20Iteration%20with%20State%20Aggregation&rft.jtitle=Mathematics%20of%20operations%20research&rft.au=Van%20Roy,%20Benjamin&rft.date=2006-05-01&rft.volume=31&rft.issue=2&rft.spage=234&rft.epage=244&rft.pages=234-244&rft.issn=0364-765X&rft.eissn=1526-5471&rft.coden=MOREDQ&rft_id=info:doi/10.1287/moor.1060.0188&rft_dat=%3Cgale_jstor%3EA147216120%3C/gale_jstor%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=212679946&rft_id=info:pmid/&rft_galeid=A147216120&rft_jstor_id=25151721&rfr_iscdi=true