Kernel-Based Approximate Dynamic Programming Using Bellman Residual Elimination

Many sequential decision-making problems related to multi-agent robotic systems can be naturally posed as Markov Decision Processes (MDPs). An important advantage of the MDP framework is the ability to utilize stochastic system models, thereby allowing the system to make sound decisions even if ther...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
1. Verfasser: Bethke, Brett M
Format: Report
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Bethke, Brett M
description Many sequential decision-making problems related to multi-agent robotic systems can be naturally posed as Markov Decision Processes (MDPs). An important advantage of the MDP framework is the ability to utilize stochastic system models, thereby allowing the system to make sound decisions even if there is randomness in the system evolution over time. Unfortunately, the curse of dimensionality prevents most MDPs of practical size from being solved exactly. One main focus of the thesis is on the development of a new family of algorithms for computing approximate solutions to large-scale MDPs. Our algorithms are similar in spirit to Bellman residual methods, which attempt to minimize the error incurred in solving Bellman's equation at a set of sample states. However, by exploiting kernel-based regression techniques (such as support vector regression and Gaussian process regression) with nondegenerate kernel functions as the underlying cost-to-go function approximation architecture, our algorithms are able to construct cost-to-go solutions for which the Bellman residuals are explicitly forced to zero at the sample states. For this reason, we have named our approach Bellman residual elimination (BRE). In addition to developing the basic ideas behind BRE, we present multi-stage and model-free extensions to the approach. The multistage extension allows for automatic selection of an appropriate kernel for the MDP at hand, while the model-free extension can use simulated or real state trajectory data to learn an approximate policy when a system model is unavailable.
format Report
fullrecord <record><control><sourceid>dtic_1RU</sourceid><recordid>TN_cdi_dtic_stinet_ADA528927</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>ADA528927</sourcerecordid><originalsourceid>FETCH-dtic_stinet_ADA5289273</originalsourceid><addsrcrecordid>eNrjZPD3Ti3KS83RdUosTk1RcCwoKMqvyMxNLElVcKnMS8zNTFYIKMpPL0rMzc3MS1cILQaRTqk5ObmJeQpBqcWZKaWJOQquOZlA6cSSzPw8HgbWtMSc4lReKM3NIOPmGuLsoZtSkpkcX1ySmZdaEu_o4mhqZGFpZG5MQBoAH280Bg</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>report</recordtype></control><display><type>report</type><title>Kernel-Based Approximate Dynamic Programming Using Bellman Residual Elimination</title><source>DTIC Technical Reports</source><creator>Bethke, Brett M</creator><creatorcontrib>Bethke, Brett M ; MASSACHUSETTS INST OF TECH CAMBRIDGE</creatorcontrib><description>Many sequential decision-making problems related to multi-agent robotic systems can be naturally posed as Markov Decision Processes (MDPs). An important advantage of the MDP framework is the ability to utilize stochastic system models, thereby allowing the system to make sound decisions even if there is randomness in the system evolution over time. Unfortunately, the curse of dimensionality prevents most MDPs of practical size from being solved exactly. One main focus of the thesis is on the development of a new family of algorithms for computing approximate solutions to large-scale MDPs. Our algorithms are similar in spirit to Bellman residual methods, which attempt to minimize the error incurred in solving Bellman's equation at a set of sample states. However, by exploiting kernel-based regression techniques (such as support vector regression and Gaussian process regression) with nondegenerate kernel functions as the underlying cost-to-go function approximation architecture, our algorithms are able to construct cost-to-go solutions for which the Bellman residuals are explicitly forced to zero at the sample states. For this reason, we have named our approach Bellman residual elimination (BRE). In addition to developing the basic ideas behind BRE, we present multi-stage and model-free extensions to the approach. The multistage extension allows for automatic selection of an appropriate kernel for the MDP at hand, while the model-free extension can use simulated or real state trajectory data to learn an approximate policy when a system model is unavailable.</description><language>eng</language><subject>ALGORITHMS ; BRE(BELLMAN RESIDUAL ELIMINATION) ; DECISION MAKING ; DYNAMIC PROGRAMMING ; ELIMINATION ; KERNEL FUNCTIONS ; MARKOV PROCESSES ; MDPS(MARKOV DECISION PROCESSES) ; Statistics and Probability ; THESES</subject><creationdate>2010</creationdate><rights>Approved for public release; distribution is unlimited.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>230,777,882,27548,27549</link.rule.ids><linktorsrc>$$Uhttps://apps.dtic.mil/sti/citations/ADA528927$$EView_record_in_DTIC$$FView_record_in_$$GDTIC$$Hfree_for_read</linktorsrc></links><search><creatorcontrib>Bethke, Brett M</creatorcontrib><creatorcontrib>MASSACHUSETTS INST OF TECH CAMBRIDGE</creatorcontrib><title>Kernel-Based Approximate Dynamic Programming Using Bellman Residual Elimination</title><description>Many sequential decision-making problems related to multi-agent robotic systems can be naturally posed as Markov Decision Processes (MDPs). An important advantage of the MDP framework is the ability to utilize stochastic system models, thereby allowing the system to make sound decisions even if there is randomness in the system evolution over time. Unfortunately, the curse of dimensionality prevents most MDPs of practical size from being solved exactly. One main focus of the thesis is on the development of a new family of algorithms for computing approximate solutions to large-scale MDPs. Our algorithms are similar in spirit to Bellman residual methods, which attempt to minimize the error incurred in solving Bellman's equation at a set of sample states. However, by exploiting kernel-based regression techniques (such as support vector regression and Gaussian process regression) with nondegenerate kernel functions as the underlying cost-to-go function approximation architecture, our algorithms are able to construct cost-to-go solutions for which the Bellman residuals are explicitly forced to zero at the sample states. For this reason, we have named our approach Bellman residual elimination (BRE). In addition to developing the basic ideas behind BRE, we present multi-stage and model-free extensions to the approach. The multistage extension allows for automatic selection of an appropriate kernel for the MDP at hand, while the model-free extension can use simulated or real state trajectory data to learn an approximate policy when a system model is unavailable.</description><subject>ALGORITHMS</subject><subject>BRE(BELLMAN RESIDUAL ELIMINATION)</subject><subject>DECISION MAKING</subject><subject>DYNAMIC PROGRAMMING</subject><subject>ELIMINATION</subject><subject>KERNEL FUNCTIONS</subject><subject>MARKOV PROCESSES</subject><subject>MDPS(MARKOV DECISION PROCESSES)</subject><subject>Statistics and Probability</subject><subject>THESES</subject><fulltext>true</fulltext><rsrctype>report</rsrctype><creationdate>2010</creationdate><recordtype>report</recordtype><sourceid>1RU</sourceid><recordid>eNrjZPD3Ti3KS83RdUosTk1RcCwoKMqvyMxNLElVcKnMS8zNTFYIKMpPL0rMzc3MS1cILQaRTqk5ObmJeQpBqcWZKaWJOQquOZlA6cSSzPw8HgbWtMSc4lReKM3NIOPmGuLsoZtSkpkcX1ySmZdaEu_o4mhqZGFpZG5MQBoAH280Bg</recordid><startdate>201002</startdate><enddate>201002</enddate><creator>Bethke, Brett M</creator><scope>1RU</scope><scope>BHM</scope></search><sort><creationdate>201002</creationdate><title>Kernel-Based Approximate Dynamic Programming Using Bellman Residual Elimination</title><author>Bethke, Brett M</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-dtic_stinet_ADA5289273</frbrgroupid><rsrctype>reports</rsrctype><prefilter>reports</prefilter><language>eng</language><creationdate>2010</creationdate><topic>ALGORITHMS</topic><topic>BRE(BELLMAN RESIDUAL ELIMINATION)</topic><topic>DECISION MAKING</topic><topic>DYNAMIC PROGRAMMING</topic><topic>ELIMINATION</topic><topic>KERNEL FUNCTIONS</topic><topic>MARKOV PROCESSES</topic><topic>MDPS(MARKOV DECISION PROCESSES)</topic><topic>Statistics and Probability</topic><topic>THESES</topic><toplevel>online_resources</toplevel><creatorcontrib>Bethke, Brett M</creatorcontrib><creatorcontrib>MASSACHUSETTS INST OF TECH CAMBRIDGE</creatorcontrib><collection>DTIC Technical Reports</collection><collection>DTIC STINET</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Bethke, Brett M</au><aucorp>MASSACHUSETTS INST OF TECH CAMBRIDGE</aucorp><format>book</format><genre>unknown</genre><ristype>RPRT</ristype><btitle>Kernel-Based Approximate Dynamic Programming Using Bellman Residual Elimination</btitle><date>2010-02</date><risdate>2010</risdate><abstract>Many sequential decision-making problems related to multi-agent robotic systems can be naturally posed as Markov Decision Processes (MDPs). An important advantage of the MDP framework is the ability to utilize stochastic system models, thereby allowing the system to make sound decisions even if there is randomness in the system evolution over time. Unfortunately, the curse of dimensionality prevents most MDPs of practical size from being solved exactly. One main focus of the thesis is on the development of a new family of algorithms for computing approximate solutions to large-scale MDPs. Our algorithms are similar in spirit to Bellman residual methods, which attempt to minimize the error incurred in solving Bellman's equation at a set of sample states. However, by exploiting kernel-based regression techniques (such as support vector regression and Gaussian process regression) with nondegenerate kernel functions as the underlying cost-to-go function approximation architecture, our algorithms are able to construct cost-to-go solutions for which the Bellman residuals are explicitly forced to zero at the sample states. For this reason, we have named our approach Bellman residual elimination (BRE). In addition to developing the basic ideas behind BRE, we present multi-stage and model-free extensions to the approach. The multistage extension allows for automatic selection of an appropriate kernel for the MDP at hand, while the model-free extension can use simulated or real state trajectory data to learn an approximate policy when a system model is unavailable.</abstract><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier
ispartof
issn
language eng
recordid cdi_dtic_stinet_ADA528927
source DTIC Technical Reports
subjects ALGORITHMS
BRE(BELLMAN RESIDUAL ELIMINATION)
DECISION MAKING
DYNAMIC PROGRAMMING
ELIMINATION
KERNEL FUNCTIONS
MARKOV PROCESSES
MDPS(MARKOV DECISION PROCESSES)
Statistics and Probability
THESES
title Kernel-Based Approximate Dynamic Programming Using Bellman Residual Elimination
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-20T06%3A00%3A32IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-dtic_1RU&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=unknown&rft.btitle=Kernel-Based%20Approximate%20Dynamic%20Programming%20Using%20Bellman%20Residual%20Elimination&rft.au=Bethke,%20Brett%20M&rft.aucorp=MASSACHUSETTS%20INST%20OF%20TECH%20CAMBRIDGE&rft.date=2010-02&rft_id=info:doi/&rft_dat=%3Cdtic_1RU%3EADA528927%3C/dtic_1RU%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true