Least Absolute Policy Iteration — A Robust Approach to Value Function Approximation

Least-squares policy iteration is a useful reinforcement learning method in robotics due to its computational efficiency. However, it tends to be sensitive to outliers in observed rewards. In this paper, we propose an alternative method that employs the absolute loss for enhancing robustness and rel...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEICE Transactions on Information and Systems 2010/09/01, Vol.E93.D(9), pp.2555-2565
Hauptverfasser:	SUGIYAMA, Masashi, HACHIYA, Hirotaka, KASHIMA, Hisashi, MORIMURA, Tetsuro
Format:	Artikel
Sprache:	eng
Schlagworte:	Applied sciences Computational efficiency Computer programs Computer science control theory systems Computer simulation Control theory. Systems Exact sciences and technology Iterative methods l1-loss function least-squares policy iteration linear programming Mathematical analysis outlier Policies Reinforcement reinforcement learning Robotics Robustness value function approximation
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	2565
container_issue	9
container_start_page	2555
container_title	IEICE Transactions on Information and Systems
container_volume	E93.D
creator	SUGIYAMA, Masashi HACHIYA, Hirotaka KASHIMA, Hisashi MORIMURA, Tetsuro
description	Least-squares policy iteration is a useful reinforcement learning method in robotics due to its computational efficiency. However, it tends to be sensitive to outliers in observed rewards. In this paper, we propose an alternative method that employs the absolute loss for enhancing robustness and reliability. The proposed method is formulated as a linear programming problem which can be solved efficiently by standard optimization software, so the computational advantage is not sacrificed for gaining robustness and reliability. We demonstrate the usefulness of the proposed approach through a simulated robot-control task.
doi_str_mv	10.1587/transinf.E93.D.2555
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1671442176</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1671442176</sourcerecordid><originalsourceid>FETCH-LOGICAL-c592t-cf8a8b0d58d184a36da12790d0032ab7c9a642205307938f8118cffc05680c943</originalsourceid><addsrcrecordid>eNpdkE1OwzAUhC0EEqVwAjbeILFJ8E-cOMuqP1BRCYQoW-vFdWiqNCm2I9Edh-CEnISkLV2wmsX7ZuZpELqmJKRCJnfeQuWKKg_HKQ9HIRNCnKAeTSIRUB7TU9QjKY0DKTg7RxfOrQihklHRQ_OZAefxIHN12XiDn-uy0Fs89caCL-oK_3x94wF-qbOmwzYbW4NeYl_jNygbgydNpXfc7vRZrHeuS3SWQ-nM1UH7aD4Zvw4fgtnT_XQ4mAVapMwHOpcgM7IQckFlBDxeAGVJShaEcAZZolOII8aI4CRJucwlpVLnuSYilkSnEe-j231u2_3RGOfVunDalCVUpm6conFCo4jRJG5Rvke1rZ2zJlcb235rt4oS1Y2o_kZU7YhqpLoRW9fNoQCchjJvEV24o5XxKG7zZcs97rmV8_BujgBYX-jS_M9Oj9q1HCm9BKtMxX8BdcyQaw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1671442176</pqid></control><display><type>article</type><title>Least Absolute Policy Iteration — A Robust Approach to Value Function Approximation</title><source>J-STAGE Free</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><creator>SUGIYAMA, Masashi ; HACHIYA, Hirotaka ; KASHIMA, Hisashi ; MORIMURA, Tetsuro</creator><creatorcontrib>SUGIYAMA, Masashi ; HACHIYA, Hirotaka ; KASHIMA, Hisashi ; MORIMURA, Tetsuro</creatorcontrib><description>Least-squares policy iteration is a useful reinforcement learning method in robotics due to its computational efficiency. However, it tends to be sensitive to outliers in observed rewards. In this paper, we propose an alternative method that employs the absolute loss for enhancing robustness and reliability. The proposed method is formulated as a linear programming problem which can be solved efficiently by standard optimization software, so the computational advantage is not sacrificed for gaining robustness and reliability. We demonstrate the usefulness of the proposed approach through a simulated robot-control task.</description><identifier>ISSN: 0916-8532</identifier><identifier>ISSN: 1745-1361</identifier><identifier>EISSN: 1745-1361</identifier><identifier>DOI: 10.1587/transinf.E93.D.2555</identifier><language>eng</language><publisher>Oxford: The Institute of Electronics, Information and Communication Engineers</publisher><subject>Applied sciences ; Computational efficiency ; Computer programs ; Computer science; control theory; systems ; Computer simulation ; Control theory. Systems ; Exact sciences and technology ; Iterative methods ; l1-loss function ; least-squares policy iteration ; linear programming ; Mathematical analysis ; outlier ; Policies ; Reinforcement ; reinforcement learning ; Robotics ; Robustness ; value function approximation</subject><ispartof>IEICE Transactions on Information and Systems, 2010/09/01, Vol.E93.D(9), pp.2555-2565</ispartof><rights>2010 The Institute of Electronics, Information and Communication Engineers</rights><rights>2015 INIST-CNRS</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c592t-cf8a8b0d58d184a36da12790d0032ab7c9a642205307938f8118cffc05680c943</citedby><cites>FETCH-LOGICAL-c592t-cf8a8b0d58d184a36da12790d0032ab7c9a642205307938f8118cffc05680c943</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,1877,4010,27900,27901,27902</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=23464428$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><creatorcontrib>SUGIYAMA, Masashi</creatorcontrib><creatorcontrib>HACHIYA, Hirotaka</creatorcontrib><creatorcontrib>KASHIMA, Hisashi</creatorcontrib><creatorcontrib>MORIMURA, Tetsuro</creatorcontrib><title>Least Absolute Policy Iteration — A Robust Approach to Value Function Approximation</title><title>IEICE Transactions on Information and Systems</title><addtitle>IEICE Trans. Inf. & Syst.</addtitle><description>Least-squares policy iteration is a useful reinforcement learning method in robotics due to its computational efficiency. However, it tends to be sensitive to outliers in observed rewards. In this paper, we propose an alternative method that employs the absolute loss for enhancing robustness and reliability. The proposed method is formulated as a linear programming problem which can be solved efficiently by standard optimization software, so the computational advantage is not sacrificed for gaining robustness and reliability. We demonstrate the usefulness of the proposed approach through a simulated robot-control task.</description><subject>Applied sciences</subject><subject>Computational efficiency</subject><subject>Computer programs</subject><subject>Computer science; control theory; systems</subject><subject>Computer simulation</subject><subject>Control theory. Systems</subject><subject>Exact sciences and technology</subject><subject>Iterative methods</subject><subject>l1-loss function</subject><subject>least-squares policy iteration</subject><subject>linear programming</subject><subject>Mathematical analysis</subject><subject>outlier</subject><subject>Policies</subject><subject>Reinforcement</subject><subject>reinforcement learning</subject><subject>Robotics</subject><subject>Robustness</subject><subject>value function approximation</subject><issn>0916-8532</issn><issn>1745-1361</issn><issn>1745-1361</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2010</creationdate><recordtype>article</recordtype><recordid>eNpdkE1OwzAUhC0EEqVwAjbeILFJ8E-cOMuqP1BRCYQoW-vFdWiqNCm2I9Edh-CEnISkLV2wmsX7ZuZpELqmJKRCJnfeQuWKKg_HKQ9HIRNCnKAeTSIRUB7TU9QjKY0DKTg7RxfOrQihklHRQ_OZAefxIHN12XiDn-uy0Fs89caCL-oK_3x94wF-qbOmwzYbW4NeYl_jNygbgydNpXfc7vRZrHeuS3SWQ-nM1UH7aD4Zvw4fgtnT_XQ4mAVapMwHOpcgM7IQckFlBDxeAGVJShaEcAZZolOII8aI4CRJucwlpVLnuSYilkSnEe-j231u2_3RGOfVunDalCVUpm6conFCo4jRJG5Rvke1rZ2zJlcb235rt4oS1Y2o_kZU7YhqpLoRW9fNoQCchjJvEV24o5XxKG7zZcs97rmV8_BujgBYX-jS_M9Oj9q1HCm9BKtMxX8BdcyQaw</recordid><startdate>2010</startdate><enddate>2010</enddate><creator>SUGIYAMA, Masashi</creator><creator>HACHIYA, Hirotaka</creator><creator>KASHIMA, Hisashi</creator><creator>MORIMURA, Tetsuro</creator><general>The Institute of Electronics, Information and Communication Engineers</general><general>Oxford University Press</general><scope>IQODW</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>2010</creationdate><title>Least Absolute Policy Iteration — A Robust Approach to Value Function Approximation</title><author>SUGIYAMA, Masashi ; HACHIYA, Hirotaka ; KASHIMA, Hisashi ; MORIMURA, Tetsuro</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c592t-cf8a8b0d58d184a36da12790d0032ab7c9a642205307938f8118cffc05680c943</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2010</creationdate><topic>Applied sciences</topic><topic>Computational efficiency</topic><topic>Computer programs</topic><topic>Computer science; control theory; systems</topic><topic>Computer simulation</topic><topic>Control theory. Systems</topic><topic>Exact sciences and technology</topic><topic>Iterative methods</topic><topic>l1-loss function</topic><topic>least-squares policy iteration</topic><topic>linear programming</topic><topic>Mathematical analysis</topic><topic>outlier</topic><topic>Policies</topic><topic>Reinforcement</topic><topic>reinforcement learning</topic><topic>Robotics</topic><topic>Robustness</topic><topic>value function approximation</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>SUGIYAMA, Masashi</creatorcontrib><creatorcontrib>HACHIYA, Hirotaka</creatorcontrib><creatorcontrib>KASHIMA, Hisashi</creatorcontrib><creatorcontrib>MORIMURA, Tetsuro</creatorcontrib><collection>Pascal-Francis</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEICE Transactions on Information and Systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>SUGIYAMA, Masashi</au><au>HACHIYA, Hirotaka</au><au>KASHIMA, Hisashi</au><au>MORIMURA, Tetsuro</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Least Absolute Policy Iteration — A Robust Approach to Value Function Approximation</atitle><jtitle>IEICE Transactions on Information and Systems</jtitle><addtitle>IEICE Trans. Inf. & Syst.</addtitle><date>2010</date><risdate>2010</risdate><volume>E93.D</volume><issue>9</issue><spage>2555</spage><epage>2565</epage><pages>2555-2565</pages><issn>0916-8532</issn><issn>1745-1361</issn><eissn>1745-1361</eissn><abstract>Least-squares policy iteration is a useful reinforcement learning method in robotics due to its computational efficiency. However, it tends to be sensitive to outliers in observed rewards. In this paper, we propose an alternative method that employs the absolute loss for enhancing robustness and reliability. The proposed method is formulated as a linear programming problem which can be solved efficiently by standard optimization software, so the computational advantage is not sacrificed for gaining robustness and reliability. We demonstrate the usefulness of the proposed approach through a simulated robot-control task.</abstract><cop>Oxford</cop><pub>The Institute of Electronics, Information and Communication Engineers</pub><doi>10.1587/transinf.E93.D.2555</doi><tpages>11</tpages><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 0916-8532
ispartof	IEICE Transactions on Information and Systems, 2010/09/01, Vol.E93.D(9), pp.2555-2565
issn	0916-8532 1745-1361 1745-1361
language	eng
recordid	cdi_proquest_miscellaneous_1671442176
source	J-STAGE Free; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals
subjects	Applied sciences Computational efficiency Computer programs Computer science control theory systems Computer simulation Control theory. Systems Exact sciences and technology Iterative methods l1-loss function least-squares policy iteration linear programming Mathematical analysis outlier Policies Reinforcement reinforcement learning Robotics Robustness value function approximation
title	Least Absolute Policy Iteration — A Robust Approach to Value Function Approximation
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-31T12%3A49%3A54IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Least%20Absolute%20Policy%20Iteration%20%E2%80%94%20A%20Robust%20Approach%20to%20Value%20Function%20Approximation&rft.jtitle=IEICE%20Transactions%20on%20Information%20and%20Systems&rft.au=SUGIYAMA,%20Masashi&rft.date=2010&rft.volume=E93.D&rft.issue=9&rft.spage=2555&rft.epage=2565&rft.pages=2555-2565&rft.issn=0916-8532&rft.eissn=1745-1361&rft_id=info:doi/10.1587/transinf.E93.D.2555&rft_dat=%3Cproquest_cross%3E1671442176%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1671442176&rft_id=info:pmid/&rfr_iscdi=true