Least Absolute Policy Iteration — A Robust Approach to Value Function Approximation

Least-squares policy iteration is a useful reinforcement learning method in robotics due to its computational efficiency. However, it tends to be sensitive to outliers in observed rewards. In this paper, we propose an alternative method that employs the absolute loss for enhancing robustness and rel...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEICE Transactions on Information and Systems 2010/09/01, Vol.E93.D(9), pp.2555-2565
Hauptverfasser: SUGIYAMA, Masashi, HACHIYA, Hirotaka, KASHIMA, Hisashi, MORIMURA, Tetsuro
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 2565
container_issue 9
container_start_page 2555
container_title IEICE Transactions on Information and Systems
container_volume E93.D
creator SUGIYAMA, Masashi
HACHIYA, Hirotaka
KASHIMA, Hisashi
MORIMURA, Tetsuro
description Least-squares policy iteration is a useful reinforcement learning method in robotics due to its computational efficiency. However, it tends to be sensitive to outliers in observed rewards. In this paper, we propose an alternative method that employs the absolute loss for enhancing robustness and reliability. The proposed method is formulated as a linear programming problem which can be solved efficiently by standard optimization software, so the computational advantage is not sacrificed for gaining robustness and reliability. We demonstrate the usefulness of the proposed approach through a simulated robot-control task.
doi_str_mv 10.1587/transinf.E93.D.2555
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1671442176</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1671442176</sourcerecordid><originalsourceid>FETCH-LOGICAL-c592t-cf8a8b0d58d184a36da12790d0032ab7c9a642205307938f8118cffc05680c943</originalsourceid><addsrcrecordid>eNpdkE1OwzAUhC0EEqVwAjbeILFJ8E-cOMuqP1BRCYQoW-vFdWiqNCm2I9Edh-CEnISkLV2wmsX7ZuZpELqmJKRCJnfeQuWKKg_HKQ9HIRNCnKAeTSIRUB7TU9QjKY0DKTg7RxfOrQihklHRQ_OZAefxIHN12XiDn-uy0Fs89caCL-oK_3x94wF-qbOmwzYbW4NeYl_jNygbgydNpXfc7vRZrHeuS3SWQ-nM1UH7aD4Zvw4fgtnT_XQ4mAVapMwHOpcgM7IQckFlBDxeAGVJShaEcAZZolOII8aI4CRJucwlpVLnuSYilkSnEe-j231u2_3RGOfVunDalCVUpm6conFCo4jRJG5Rvke1rZ2zJlcb235rt4oS1Y2o_kZU7YhqpLoRW9fNoQCchjJvEV24o5XxKG7zZcs97rmV8_BujgBYX-jS_M9Oj9q1HCm9BKtMxX8BdcyQaw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1671442176</pqid></control><display><type>article</type><title>Least Absolute Policy Iteration — A Robust Approach to Value Function Approximation</title><source>J-STAGE Free</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><creator>SUGIYAMA, Masashi ; HACHIYA, Hirotaka ; KASHIMA, Hisashi ; MORIMURA, Tetsuro</creator><creatorcontrib>SUGIYAMA, Masashi ; HACHIYA, Hirotaka ; KASHIMA, Hisashi ; MORIMURA, Tetsuro</creatorcontrib><description>Least-squares policy iteration is a useful reinforcement learning method in robotics due to its computational efficiency. However, it tends to be sensitive to outliers in observed rewards. In this paper, we propose an alternative method that employs the absolute loss for enhancing robustness and reliability. The proposed method is formulated as a linear programming problem which can be solved efficiently by standard optimization software, so the computational advantage is not sacrificed for gaining robustness and reliability. We demonstrate the usefulness of the proposed approach through a simulated robot-control task.</description><identifier>ISSN: 0916-8532</identifier><identifier>ISSN: 1745-1361</identifier><identifier>EISSN: 1745-1361</identifier><identifier>DOI: 10.1587/transinf.E93.D.2555</identifier><language>eng</language><publisher>Oxford: The Institute of Electronics, Information and Communication Engineers</publisher><subject>Applied sciences ; Computational efficiency ; Computer programs ; Computer science; control theory; systems ; Computer simulation ; Control theory. Systems ; Exact sciences and technology ; Iterative methods ; l1-loss function ; least-squares policy iteration ; linear programming ; Mathematical analysis ; outlier ; Policies ; Reinforcement ; reinforcement learning ; Robotics ; Robustness ; value function approximation</subject><ispartof>IEICE Transactions on Information and Systems, 2010/09/01, Vol.E93.D(9), pp.2555-2565</ispartof><rights>2010 The Institute of Electronics, Information and Communication Engineers</rights><rights>2015 INIST-CNRS</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c592t-cf8a8b0d58d184a36da12790d0032ab7c9a642205307938f8118cffc05680c943</citedby><cites>FETCH-LOGICAL-c592t-cf8a8b0d58d184a36da12790d0032ab7c9a642205307938f8118cffc05680c943</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,1877,4010,27900,27901,27902</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&amp;idt=23464428$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><creatorcontrib>SUGIYAMA, Masashi</creatorcontrib><creatorcontrib>HACHIYA, Hirotaka</creatorcontrib><creatorcontrib>KASHIMA, Hisashi</creatorcontrib><creatorcontrib>MORIMURA, Tetsuro</creatorcontrib><title>Least Absolute Policy Iteration — A Robust Approach to Value Function Approximation</title><title>IEICE Transactions on Information and Systems</title><addtitle>IEICE Trans. Inf. &amp; Syst.</addtitle><description>Least-squares policy iteration is a useful reinforcement learning method in robotics due to its computational efficiency. However, it tends to be sensitive to outliers in observed rewards. In this paper, we propose an alternative method that employs the absolute loss for enhancing robustness and reliability. The proposed method is formulated as a linear programming problem which can be solved efficiently by standard optimization software, so the computational advantage is not sacrificed for gaining robustness and reliability. We demonstrate the usefulness of the proposed approach through a simulated robot-control task.</description><subject>Applied sciences</subject><subject>Computational efficiency</subject><subject>Computer programs</subject><subject>Computer science; control theory; systems</subject><subject>Computer simulation</subject><subject>Control theory. Systems</subject><subject>Exact sciences and technology</subject><subject>Iterative methods</subject><subject>l1-loss function</subject><subject>least-squares policy iteration</subject><subject>linear programming</subject><subject>Mathematical analysis</subject><subject>outlier</subject><subject>Policies</subject><subject>Reinforcement</subject><subject>reinforcement learning</subject><subject>Robotics</subject><subject>Robustness</subject><subject>value function approximation</subject><issn>0916-8532</issn><issn>1745-1361</issn><issn>1745-1361</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2010</creationdate><recordtype>article</recordtype><recordid>eNpdkE1OwzAUhC0EEqVwAjbeILFJ8E-cOMuqP1BRCYQoW-vFdWiqNCm2I9Edh-CEnISkLV2wmsX7ZuZpELqmJKRCJnfeQuWKKg_HKQ9HIRNCnKAeTSIRUB7TU9QjKY0DKTg7RxfOrQihklHRQ_OZAefxIHN12XiDn-uy0Fs89caCL-oK_3x94wF-qbOmwzYbW4NeYl_jNygbgydNpXfc7vRZrHeuS3SWQ-nM1UH7aD4Zvw4fgtnT_XQ4mAVapMwHOpcgM7IQckFlBDxeAGVJShaEcAZZolOII8aI4CRJucwlpVLnuSYilkSnEe-j231u2_3RGOfVunDalCVUpm6conFCo4jRJG5Rvke1rZ2zJlcb235rt4oS1Y2o_kZU7YhqpLoRW9fNoQCchjJvEV24o5XxKG7zZcs97rmV8_BujgBYX-jS_M9Oj9q1HCm9BKtMxX8BdcyQaw</recordid><startdate>2010</startdate><enddate>2010</enddate><creator>SUGIYAMA, Masashi</creator><creator>HACHIYA, Hirotaka</creator><creator>KASHIMA, Hisashi</creator><creator>MORIMURA, Tetsuro</creator><general>The Institute of Electronics, Information and Communication Engineers</general><general>Oxford University Press</general><scope>IQODW</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>2010</creationdate><title>Least Absolute Policy Iteration — A Robust Approach to Value Function Approximation</title><author>SUGIYAMA, Masashi ; HACHIYA, Hirotaka ; KASHIMA, Hisashi ; MORIMURA, Tetsuro</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c592t-cf8a8b0d58d184a36da12790d0032ab7c9a642205307938f8118cffc05680c943</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2010</creationdate><topic>Applied sciences</topic><topic>Computational efficiency</topic><topic>Computer programs</topic><topic>Computer science; control theory; systems</topic><topic>Computer simulation</topic><topic>Control theory. Systems</topic><topic>Exact sciences and technology</topic><topic>Iterative methods</topic><topic>l1-loss function</topic><topic>least-squares policy iteration</topic><topic>linear programming</topic><topic>Mathematical analysis</topic><topic>outlier</topic><topic>Policies</topic><topic>Reinforcement</topic><topic>reinforcement learning</topic><topic>Robotics</topic><topic>Robustness</topic><topic>value function approximation</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>SUGIYAMA, Masashi</creatorcontrib><creatorcontrib>HACHIYA, Hirotaka</creatorcontrib><creatorcontrib>KASHIMA, Hisashi</creatorcontrib><creatorcontrib>MORIMURA, Tetsuro</creatorcontrib><collection>Pascal-Francis</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEICE Transactions on Information and Systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>SUGIYAMA, Masashi</au><au>HACHIYA, Hirotaka</au><au>KASHIMA, Hisashi</au><au>MORIMURA, Tetsuro</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Least Absolute Policy Iteration — A Robust Approach to Value Function Approximation</atitle><jtitle>IEICE Transactions on Information and Systems</jtitle><addtitle>IEICE Trans. Inf. &amp; Syst.</addtitle><date>2010</date><risdate>2010</risdate><volume>E93.D</volume><issue>9</issue><spage>2555</spage><epage>2565</epage><pages>2555-2565</pages><issn>0916-8532</issn><issn>1745-1361</issn><eissn>1745-1361</eissn><abstract>Least-squares policy iteration is a useful reinforcement learning method in robotics due to its computational efficiency. However, it tends to be sensitive to outliers in observed rewards. In this paper, we propose an alternative method that employs the absolute loss for enhancing robustness and reliability. The proposed method is formulated as a linear programming problem which can be solved efficiently by standard optimization software, so the computational advantage is not sacrificed for gaining robustness and reliability. We demonstrate the usefulness of the proposed approach through a simulated robot-control task.</abstract><cop>Oxford</cop><pub>The Institute of Electronics, Information and Communication Engineers</pub><doi>10.1587/transinf.E93.D.2555</doi><tpages>11</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0916-8532
ispartof IEICE Transactions on Information and Systems, 2010/09/01, Vol.E93.D(9), pp.2555-2565
issn 0916-8532
1745-1361
1745-1361
language eng
recordid cdi_proquest_miscellaneous_1671442176
source J-STAGE Free; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals
subjects Applied sciences
Computational efficiency
Computer programs
Computer science
control theory
systems
Computer simulation
Control theory. Systems
Exact sciences and technology
Iterative methods
l1-loss function
least-squares policy iteration
linear programming
Mathematical analysis
outlier
Policies
Reinforcement
reinforcement learning
Robotics
Robustness
value function approximation
title Least Absolute Policy Iteration — A Robust Approach to Value Function Approximation
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-31T12%3A49%3A54IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Least%20Absolute%20Policy%20Iteration%20%E2%80%94%20A%20Robust%20Approach%20to%20Value%20Function%20Approximation&rft.jtitle=IEICE%20Transactions%20on%20Information%20and%20Systems&rft.au=SUGIYAMA,%20Masashi&rft.date=2010&rft.volume=E93.D&rft.issue=9&rft.spage=2555&rft.epage=2565&rft.pages=2555-2565&rft.issn=0916-8532&rft.eissn=1745-1361&rft_id=info:doi/10.1587/transinf.E93.D.2555&rft_dat=%3Cproquest_cross%3E1671442176%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1671442176&rft_id=info:pmid/&rfr_iscdi=true