Performance Loss Bounds for Approximate Value Iteration with State Aggregation
We consider approximate value iteration with a parameterized approximator in which the state space is partitioned and the optimal cost-to-go function over each partition is approximated by a constant. We establish performance loss bounds for policies derived from approximations associated with fixed...
Gespeichert in:
Veröffentlicht in: | Mathematics of operations research 2006-05, Vol.31 (2), p.234-244 |
---|---|
1. Verfasser: | |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 244 |
---|---|
container_issue | 2 |
container_start_page | 234 |
container_title | Mathematics of operations research |
container_volume | 31 |
creator | Van Roy, Benjamin |
description | We consider approximate value iteration with a parameterized approximator in which the state space is partitioned and the optimal cost-to-go function over each partition is approximated by a constant. We establish performance loss bounds for policies derived from approximations associated with fixed points. These bounds identify benefits to using invariant distributions of appropriate policies as projection weights. Such projection weighting relates to what is done by temporal-difference learning. Our analysis also leads to the first performance loss bound for approximate value iteration with an average-cost objective. |
doi_str_mv | 10.1287/moor.1060.0188 |
format | Article |
fullrecord | <record><control><sourceid>gale_jstor</sourceid><recordid>TN_cdi_jstor_primary_25151721</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A147216120</galeid><jstor_id>25151721</jstor_id><sourcerecordid>A147216120</sourcerecordid><originalsourceid>FETCH-LOGICAL-c469t-b76905ae9cab60d072a89b6d13e4206747d7d970cb0707e5ed6a64b6f6b0a2973</originalsourceid><addsrcrecordid>eNqFkdGL1DAQxoN44Hrnq29C8UV86N5Mmibbx_XQu4XlFE_Ft5C2026XbbOXpNz535taRYQFyUNg5vd9mcnH2EuEJfKVuuytdUsECUvA1eoJW2DOZZoLhU_ZAjIpUiXz78_Yc-_3AJgrFAt2-4lcY11vhoqSrfU-eWfHofZJLCbr49HZx643gZJv5jBSsgnkTOjskDx0YZfcham1bltH7a_yBTtrzMHTi9_3Ofv64f2Xq5t0-_F6c7XeppWQRUhLJQvIDRWVKSXUoLhZFaWsMSPBQSqhalUXCqoSFCjKqZZGilI2sgTDC5Wds9ezbxzwfiQf9N6ObohPao5cqqIQMkLpDLXmQLobGhucqVoa4g4HO1DTxfIaheIokUPklyf4eGrqu-qk4O0_gsgEegytGb3Xm7vPJ80rF3_ZUaOPLv6s-6ER9JSfnvLTU356yi8KXs2CvQ-x8YfmOeYYTf9uNw3qev9_vzczv-va3UPn5g0nYcx3F9EMNdc8E9lPqZGy2A</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>212679946</pqid></control><display><type>article</type><title>Performance Loss Bounds for Approximate Value Iteration with State Aggregation</title><source>Informs</source><source>JSTOR Mathematics & Statistics</source><source>EBSCOhost Business Source Complete</source><source>JSTOR Archive Collection A-Z Listing</source><creator>Van Roy, Benjamin</creator><creatorcontrib>Van Roy, Benjamin</creatorcontrib><description>We consider approximate value iteration with a parameterized approximator in which the state space is partitioned and the optimal cost-to-go function over each partition is approximated by a constant. We establish performance loss bounds for policies derived from approximations associated with fixed points. These bounds identify benefits to using invariant distributions of appropriate policies as projection weights. Such projection weighting relates to what is done by temporal-difference learning. Our analysis also leads to the first performance loss bound for approximate value iteration with an average-cost objective.</description><identifier>ISSN: 0364-765X</identifier><identifier>EISSN: 1526-5471</identifier><identifier>DOI: 10.1287/moor.1060.0188</identifier><identifier>CODEN: MOREDQ</identifier><language>eng</language><publisher>Linthicum: INFORMS</publisher><subject>Aggregation ; Analysis ; approximate value iteration ; Approximate values ; Approximation ; Approximations ; Difference equations ; Dynamic programming ; Iterative solutions ; Machine learning ; Markov processes ; Mathematical aptitude ; Mathematical functions ; Mathematical theorems ; Operations research ; Optimization algorithms ; state aggregation ; Studies ; temporal-difference learning</subject><ispartof>Mathematics of operations research, 2006-05, Vol.31 (2), p.234-244</ispartof><rights>Copyright 2006 Institute for Operations Research and the Management Sciences</rights><rights>COPYRIGHT 2006 Institute for Operations Research and the Management Sciences</rights><rights>Copyright Institute for Operations Research and the Management Sciences May 2006</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c469t-b76905ae9cab60d072a89b6d13e4206747d7d970cb0707e5ed6a64b6f6b0a2973</citedby><cites>FETCH-LOGICAL-c469t-b76905ae9cab60d072a89b6d13e4206747d7d970cb0707e5ed6a64b6f6b0a2973</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.jstor.org/stable/pdf/25151721$$EPDF$$P50$$Gjstor$$H</linktopdf><linktohtml>$$Uhttps://pubsonline.informs.org/doi/full/10.1287/moor.1060.0188$$EHTML$$P50$$Ginforms$$H</linktohtml><link.rule.ids>314,780,784,803,832,3692,27924,27925,58017,58021,58250,58254,62616</link.rule.ids></links><search><creatorcontrib>Van Roy, Benjamin</creatorcontrib><title>Performance Loss Bounds for Approximate Value Iteration with State Aggregation</title><title>Mathematics of operations research</title><description>We consider approximate value iteration with a parameterized approximator in which the state space is partitioned and the optimal cost-to-go function over each partition is approximated by a constant. We establish performance loss bounds for policies derived from approximations associated with fixed points. These bounds identify benefits to using invariant distributions of appropriate policies as projection weights. Such projection weighting relates to what is done by temporal-difference learning. Our analysis also leads to the first performance loss bound for approximate value iteration with an average-cost objective.</description><subject>Aggregation</subject><subject>Analysis</subject><subject>approximate value iteration</subject><subject>Approximate values</subject><subject>Approximation</subject><subject>Approximations</subject><subject>Difference equations</subject><subject>Dynamic programming</subject><subject>Iterative solutions</subject><subject>Machine learning</subject><subject>Markov processes</subject><subject>Mathematical aptitude</subject><subject>Mathematical functions</subject><subject>Mathematical theorems</subject><subject>Operations research</subject><subject>Optimization algorithms</subject><subject>state aggregation</subject><subject>Studies</subject><subject>temporal-difference learning</subject><issn>0364-765X</issn><issn>1526-5471</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2006</creationdate><recordtype>article</recordtype><sourceid>8G5</sourceid><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><sourceid>GUQSH</sourceid><sourceid>M2O</sourceid><recordid>eNqFkdGL1DAQxoN44Hrnq29C8UV86N5Mmibbx_XQu4XlFE_Ft5C2026XbbOXpNz535taRYQFyUNg5vd9mcnH2EuEJfKVuuytdUsECUvA1eoJW2DOZZoLhU_ZAjIpUiXz78_Yc-_3AJgrFAt2-4lcY11vhoqSrfU-eWfHofZJLCbr49HZx643gZJv5jBSsgnkTOjskDx0YZfcham1bltH7a_yBTtrzMHTi9_3Ofv64f2Xq5t0-_F6c7XeppWQRUhLJQvIDRWVKSXUoLhZFaWsMSPBQSqhalUXCqoSFCjKqZZGilI2sgTDC5Wds9ezbxzwfiQf9N6ObohPao5cqqIQMkLpDLXmQLobGhucqVoa4g4HO1DTxfIaheIokUPklyf4eGrqu-qk4O0_gsgEegytGb3Xm7vPJ80rF3_ZUaOPLv6s-6ER9JSfnvLTU356yi8KXs2CvQ-x8YfmOeYYTf9uNw3qev9_vzczv-va3UPn5g0nYcx3F9EMNdc8E9lPqZGy2A</recordid><startdate>20060501</startdate><enddate>20060501</enddate><creator>Van Roy, Benjamin</creator><general>INFORMS</general><general>Institute for Operations Research and the Management Sciences</general><scope>AAYXX</scope><scope>CITATION</scope><scope>ISR</scope><scope>3V.</scope><scope>7WY</scope><scope>7WZ</scope><scope>7XB</scope><scope>87Z</scope><scope>8AL</scope><scope>8AO</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>8FL</scope><scope>8G5</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BEZIV</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FRNLG</scope><scope>F~G</scope><scope>GNUQQ</scope><scope>GUQSH</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K60</scope><scope>K6~</scope><scope>K7-</scope><scope>L.-</scope><scope>L6V</scope><scope>M0C</scope><scope>M0N</scope><scope>M2O</scope><scope>M7S</scope><scope>MBDVC</scope><scope>P5Z</scope><scope>P62</scope><scope>PADUT</scope><scope>PQBIZ</scope><scope>PQBZA</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope><scope>PYYUZ</scope><scope>Q9U</scope></search><sort><creationdate>20060501</creationdate><title>Performance Loss Bounds for Approximate Value Iteration with State Aggregation</title><author>Van Roy, Benjamin</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c469t-b76905ae9cab60d072a89b6d13e4206747d7d970cb0707e5ed6a64b6f6b0a2973</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2006</creationdate><topic>Aggregation</topic><topic>Analysis</topic><topic>approximate value iteration</topic><topic>Approximate values</topic><topic>Approximation</topic><topic>Approximations</topic><topic>Difference equations</topic><topic>Dynamic programming</topic><topic>Iterative solutions</topic><topic>Machine learning</topic><topic>Markov processes</topic><topic>Mathematical aptitude</topic><topic>Mathematical functions</topic><topic>Mathematical theorems</topic><topic>Operations research</topic><topic>Optimization algorithms</topic><topic>state aggregation</topic><topic>Studies</topic><topic>temporal-difference learning</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Van Roy, Benjamin</creatorcontrib><collection>CrossRef</collection><collection>Gale In Context: Science</collection><collection>ProQuest Central (Corporate)</collection><collection>Access via ABI/INFORM (ProQuest)</collection><collection>ABI/INFORM Global (PDF only)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>ABI/INFORM Global (Alumni Edition)</collection><collection>Computing Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ABI/INFORM Collection (Alumni Edition)</collection><collection>Research Library (Alumni Edition)</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Business Premium Collection</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Business Premium Collection (Alumni)</collection><collection>ABI/INFORM Global (Corporate)</collection><collection>ProQuest Central Student</collection><collection>Research Library Prep</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Business Collection (Alumni Edition)</collection><collection>ProQuest Business Collection</collection><collection>Computer Science Database</collection><collection>ABI/INFORM Professional Advanced</collection><collection>ProQuest Engineering Collection</collection><collection>ABI/INFORM Global</collection><collection>Computing Database</collection><collection>Research Library</collection><collection>Engineering Database</collection><collection>Research Library (Corporate)</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>Research Library China</collection><collection>ProQuest One Business</collection><collection>ProQuest One Business (Alumni)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection><collection>ABI/INFORM Collection China</collection><collection>ProQuest Central Basic</collection><jtitle>Mathematics of operations research</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Van Roy, Benjamin</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Performance Loss Bounds for Approximate Value Iteration with State Aggregation</atitle><jtitle>Mathematics of operations research</jtitle><date>2006-05-01</date><risdate>2006</risdate><volume>31</volume><issue>2</issue><spage>234</spage><epage>244</epage><pages>234-244</pages><issn>0364-765X</issn><eissn>1526-5471</eissn><coden>MOREDQ</coden><abstract>We consider approximate value iteration with a parameterized approximator in which the state space is partitioned and the optimal cost-to-go function over each partition is approximated by a constant. We establish performance loss bounds for policies derived from approximations associated with fixed points. These bounds identify benefits to using invariant distributions of appropriate policies as projection weights. Such projection weighting relates to what is done by temporal-difference learning. Our analysis also leads to the first performance loss bound for approximate value iteration with an average-cost objective.</abstract><cop>Linthicum</cop><pub>INFORMS</pub><doi>10.1287/moor.1060.0188</doi><tpages>11</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0364-765X |
ispartof | Mathematics of operations research, 2006-05, Vol.31 (2), p.234-244 |
issn | 0364-765X 1526-5471 |
language | eng |
recordid | cdi_jstor_primary_25151721 |
source | Informs; JSTOR Mathematics & Statistics; EBSCOhost Business Source Complete; JSTOR Archive Collection A-Z Listing |
subjects | Aggregation Analysis approximate value iteration Approximate values Approximation Approximations Difference equations Dynamic programming Iterative solutions Machine learning Markov processes Mathematical aptitude Mathematical functions Mathematical theorems Operations research Optimization algorithms state aggregation Studies temporal-difference learning |
title | Performance Loss Bounds for Approximate Value Iteration with State Aggregation |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-25T01%3A38%3A59IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_jstor&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Performance%20Loss%20Bounds%20for%20Approximate%20Value%20Iteration%20with%20State%20Aggregation&rft.jtitle=Mathematics%20of%20operations%20research&rft.au=Van%20Roy,%20Benjamin&rft.date=2006-05-01&rft.volume=31&rft.issue=2&rft.spage=234&rft.epage=244&rft.pages=234-244&rft.issn=0364-765X&rft.eissn=1526-5471&rft.coden=MOREDQ&rft_id=info:doi/10.1287/moor.1060.0188&rft_dat=%3Cgale_jstor%3EA147216120%3C/gale_jstor%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=212679946&rft_id=info:pmid/&rft_galeid=A147216120&rft_jstor_id=25151721&rfr_iscdi=true |