Least Absolute Policy Iteration — A Robust Approach to Value Function Approximation
Least-squares policy iteration is a useful reinforcement learning method in robotics due to its computational efficiency. However, it tends to be sensitive to outliers in observed rewards. In this paper, we propose an alternative method that employs the absolute loss for enhancing robustness and rel...
Gespeichert in:
Veröffentlicht in: | IEICE Transactions on Information and Systems 2010/09/01, Vol.E93.D(9), pp.2555-2565 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 2565 |
---|---|
container_issue | 9 |
container_start_page | 2555 |
container_title | IEICE Transactions on Information and Systems |
container_volume | E93.D |
creator | SUGIYAMA, Masashi HACHIYA, Hirotaka KASHIMA, Hisashi MORIMURA, Tetsuro |
description | Least-squares policy iteration is a useful reinforcement learning method in robotics due to its computational efficiency. However, it tends to be sensitive to outliers in observed rewards. In this paper, we propose an alternative method that employs the absolute loss for enhancing robustness and reliability. The proposed method is formulated as a linear programming problem which can be solved efficiently by standard optimization software, so the computational advantage is not sacrificed for gaining robustness and reliability. We demonstrate the usefulness of the proposed approach through a simulated robot-control task. |
doi_str_mv | 10.1587/transinf.E93.D.2555 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1671442176</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1671442176</sourcerecordid><originalsourceid>FETCH-LOGICAL-c592t-cf8a8b0d58d184a36da12790d0032ab7c9a642205307938f8118cffc05680c943</originalsourceid><addsrcrecordid>eNpdkE1OwzAUhC0EEqVwAjbeILFJ8E-cOMuqP1BRCYQoW-vFdWiqNCm2I9Edh-CEnISkLV2wmsX7ZuZpELqmJKRCJnfeQuWKKg_HKQ9HIRNCnKAeTSIRUB7TU9QjKY0DKTg7RxfOrQihklHRQ_OZAefxIHN12XiDn-uy0Fs89caCL-oK_3x94wF-qbOmwzYbW4NeYl_jNygbgydNpXfc7vRZrHeuS3SWQ-nM1UH7aD4Zvw4fgtnT_XQ4mAVapMwHOpcgM7IQckFlBDxeAGVJShaEcAZZolOII8aI4CRJucwlpVLnuSYilkSnEe-j231u2_3RGOfVunDalCVUpm6conFCo4jRJG5Rvke1rZ2zJlcb235rt4oS1Y2o_kZU7YhqpLoRW9fNoQCchjJvEV24o5XxKG7zZcs97rmV8_BujgBYX-jS_M9Oj9q1HCm9BKtMxX8BdcyQaw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1671442176</pqid></control><display><type>article</type><title>Least Absolute Policy Iteration — A Robust Approach to Value Function Approximation</title><source>J-STAGE Free</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><creator>SUGIYAMA, Masashi ; HACHIYA, Hirotaka ; KASHIMA, Hisashi ; MORIMURA, Tetsuro</creator><creatorcontrib>SUGIYAMA, Masashi ; HACHIYA, Hirotaka ; KASHIMA, Hisashi ; MORIMURA, Tetsuro</creatorcontrib><description>Least-squares policy iteration is a useful reinforcement learning method in robotics due to its computational efficiency. However, it tends to be sensitive to outliers in observed rewards. In this paper, we propose an alternative method that employs the absolute loss for enhancing robustness and reliability. The proposed method is formulated as a linear programming problem which can be solved efficiently by standard optimization software, so the computational advantage is not sacrificed for gaining robustness and reliability. We demonstrate the usefulness of the proposed approach through a simulated robot-control task.</description><identifier>ISSN: 0916-8532</identifier><identifier>ISSN: 1745-1361</identifier><identifier>EISSN: 1745-1361</identifier><identifier>DOI: 10.1587/transinf.E93.D.2555</identifier><language>eng</language><publisher>Oxford: The Institute of Electronics, Information and Communication Engineers</publisher><subject>Applied sciences ; Computational efficiency ; Computer programs ; Computer science; control theory; systems ; Computer simulation ; Control theory. Systems ; Exact sciences and technology ; Iterative methods ; l1-loss function ; least-squares policy iteration ; linear programming ; Mathematical analysis ; outlier ; Policies ; Reinforcement ; reinforcement learning ; Robotics ; Robustness ; value function approximation</subject><ispartof>IEICE Transactions on Information and Systems, 2010/09/01, Vol.E93.D(9), pp.2555-2565</ispartof><rights>2010 The Institute of Electronics, Information and Communication Engineers</rights><rights>2015 INIST-CNRS</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c592t-cf8a8b0d58d184a36da12790d0032ab7c9a642205307938f8118cffc05680c943</citedby><cites>FETCH-LOGICAL-c592t-cf8a8b0d58d184a36da12790d0032ab7c9a642205307938f8118cffc05680c943</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,1877,4010,27900,27901,27902</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=23464428$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><creatorcontrib>SUGIYAMA, Masashi</creatorcontrib><creatorcontrib>HACHIYA, Hirotaka</creatorcontrib><creatorcontrib>KASHIMA, Hisashi</creatorcontrib><creatorcontrib>MORIMURA, Tetsuro</creatorcontrib><title>Least Absolute Policy Iteration — A Robust Approach to Value Function Approximation</title><title>IEICE Transactions on Information and Systems</title><addtitle>IEICE Trans. Inf. & Syst.</addtitle><description>Least-squares policy iteration is a useful reinforcement learning method in robotics due to its computational efficiency. However, it tends to be sensitive to outliers in observed rewards. In this paper, we propose an alternative method that employs the absolute loss for enhancing robustness and reliability. The proposed method is formulated as a linear programming problem which can be solved efficiently by standard optimization software, so the computational advantage is not sacrificed for gaining robustness and reliability. We demonstrate the usefulness of the proposed approach through a simulated robot-control task.</description><subject>Applied sciences</subject><subject>Computational efficiency</subject><subject>Computer programs</subject><subject>Computer science; control theory; systems</subject><subject>Computer simulation</subject><subject>Control theory. Systems</subject><subject>Exact sciences and technology</subject><subject>Iterative methods</subject><subject>l1-loss function</subject><subject>least-squares policy iteration</subject><subject>linear programming</subject><subject>Mathematical analysis</subject><subject>outlier</subject><subject>Policies</subject><subject>Reinforcement</subject><subject>reinforcement learning</subject><subject>Robotics</subject><subject>Robustness</subject><subject>value function approximation</subject><issn>0916-8532</issn><issn>1745-1361</issn><issn>1745-1361</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2010</creationdate><recordtype>article</recordtype><recordid>eNpdkE1OwzAUhC0EEqVwAjbeILFJ8E-cOMuqP1BRCYQoW-vFdWiqNCm2I9Edh-CEnISkLV2wmsX7ZuZpELqmJKRCJnfeQuWKKg_HKQ9HIRNCnKAeTSIRUB7TU9QjKY0DKTg7RxfOrQihklHRQ_OZAefxIHN12XiDn-uy0Fs89caCL-oK_3x94wF-qbOmwzYbW4NeYl_jNygbgydNpXfc7vRZrHeuS3SWQ-nM1UH7aD4Zvw4fgtnT_XQ4mAVapMwHOpcgM7IQckFlBDxeAGVJShaEcAZZolOII8aI4CRJucwlpVLnuSYilkSnEe-j231u2_3RGOfVunDalCVUpm6conFCo4jRJG5Rvke1rZ2zJlcb235rt4oS1Y2o_kZU7YhqpLoRW9fNoQCchjJvEV24o5XxKG7zZcs97rmV8_BujgBYX-jS_M9Oj9q1HCm9BKtMxX8BdcyQaw</recordid><startdate>2010</startdate><enddate>2010</enddate><creator>SUGIYAMA, Masashi</creator><creator>HACHIYA, Hirotaka</creator><creator>KASHIMA, Hisashi</creator><creator>MORIMURA, Tetsuro</creator><general>The Institute of Electronics, Information and Communication Engineers</general><general>Oxford University Press</general><scope>IQODW</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>2010</creationdate><title>Least Absolute Policy Iteration — A Robust Approach to Value Function Approximation</title><author>SUGIYAMA, Masashi ; HACHIYA, Hirotaka ; KASHIMA, Hisashi ; MORIMURA, Tetsuro</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c592t-cf8a8b0d58d184a36da12790d0032ab7c9a642205307938f8118cffc05680c943</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2010</creationdate><topic>Applied sciences</topic><topic>Computational efficiency</topic><topic>Computer programs</topic><topic>Computer science; control theory; systems</topic><topic>Computer simulation</topic><topic>Control theory. Systems</topic><topic>Exact sciences and technology</topic><topic>Iterative methods</topic><topic>l1-loss function</topic><topic>least-squares policy iteration</topic><topic>linear programming</topic><topic>Mathematical analysis</topic><topic>outlier</topic><topic>Policies</topic><topic>Reinforcement</topic><topic>reinforcement learning</topic><topic>Robotics</topic><topic>Robustness</topic><topic>value function approximation</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>SUGIYAMA, Masashi</creatorcontrib><creatorcontrib>HACHIYA, Hirotaka</creatorcontrib><creatorcontrib>KASHIMA, Hisashi</creatorcontrib><creatorcontrib>MORIMURA, Tetsuro</creatorcontrib><collection>Pascal-Francis</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEICE Transactions on Information and Systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>SUGIYAMA, Masashi</au><au>HACHIYA, Hirotaka</au><au>KASHIMA, Hisashi</au><au>MORIMURA, Tetsuro</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Least Absolute Policy Iteration — A Robust Approach to Value Function Approximation</atitle><jtitle>IEICE Transactions on Information and Systems</jtitle><addtitle>IEICE Trans. Inf. & Syst.</addtitle><date>2010</date><risdate>2010</risdate><volume>E93.D</volume><issue>9</issue><spage>2555</spage><epage>2565</epage><pages>2555-2565</pages><issn>0916-8532</issn><issn>1745-1361</issn><eissn>1745-1361</eissn><abstract>Least-squares policy iteration is a useful reinforcement learning method in robotics due to its computational efficiency. However, it tends to be sensitive to outliers in observed rewards. In this paper, we propose an alternative method that employs the absolute loss for enhancing robustness and reliability. The proposed method is formulated as a linear programming problem which can be solved efficiently by standard optimization software, so the computational advantage is not sacrificed for gaining robustness and reliability. We demonstrate the usefulness of the proposed approach through a simulated robot-control task.</abstract><cop>Oxford</cop><pub>The Institute of Electronics, Information and Communication Engineers</pub><doi>10.1587/transinf.E93.D.2555</doi><tpages>11</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0916-8532 |
ispartof | IEICE Transactions on Information and Systems, 2010/09/01, Vol.E93.D(9), pp.2555-2565 |
issn | 0916-8532 1745-1361 1745-1361 |
language | eng |
recordid | cdi_proquest_miscellaneous_1671442176 |
source | J-STAGE Free; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals |
subjects | Applied sciences Computational efficiency Computer programs Computer science control theory systems Computer simulation Control theory. Systems Exact sciences and technology Iterative methods l1-loss function least-squares policy iteration linear programming Mathematical analysis outlier Policies Reinforcement reinforcement learning Robotics Robustness value function approximation |
title | Least Absolute Policy Iteration — A Robust Approach to Value Function Approximation |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-31T12%3A49%3A54IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Least%20Absolute%20Policy%20Iteration%20%E2%80%94%20A%20Robust%20Approach%20to%20Value%20Function%20Approximation&rft.jtitle=IEICE%20Transactions%20on%20Information%20and%20Systems&rft.au=SUGIYAMA,%20Masashi&rft.date=2010&rft.volume=E93.D&rft.issue=9&rft.spage=2555&rft.epage=2565&rft.pages=2555-2565&rft.issn=0916-8532&rft.eissn=1745-1361&rft_id=info:doi/10.1587/transinf.E93.D.2555&rft_dat=%3Cproquest_cross%3E1671442176%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1671442176&rft_id=info:pmid/&rfr_iscdi=true |