ACRE: Actor-Critic with Reward-Preserving Exploration

While reinforcement learning (RL) algorithms have generated impressive strategies for a wide range of tasks, the performance improvements in continuous-domain, real-world problems do not follow the same trend. Poor exploration and quick convergence to locally optimal solutions play a dominant role....

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Neural computing & applications 2023-10, Vol.35 (30), p.22563-22576
Hauptverfasser:	Kapoutsis, Athanasios Ch, Koutras, Dimitrios I., Korkas, Christos D., Kosmatopoulos, Elias B.
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Artificial Intelligence Computational Biology/Bioinformatics Computational Science and Engineering Computer Science Data Mining and Knowledge Discovery Image Processing and Computer Vision Machine learning Markov processes Optimization Original Article Probabilistic models Probability and Statistics in Computer Science
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	22576
container_issue	30
container_start_page	22563
container_title	Neural computing & applications
container_volume	35
creator	Kapoutsis, Athanasios Ch Koutras, Dimitrios I. Korkas, Christos D. Kosmatopoulos, Elias B.
description	While reinforcement learning (RL) algorithms have generated impressive strategies for a wide range of tasks, the performance improvements in continuous-domain, real-world problems do not follow the same trend. Poor exploration and quick convergence to locally optimal solutions play a dominant role. Advanced RL algorithms attempt to mitigate this issue by introducing exploration signals during the training procedure. This successful integration has paved the way to introduce signals from the intrinsic exploration branch. ACRE algorithm is a framework that concretely describes the conditions for such an integration, avoiding transforming the Markov decision process into time varying, and as a result, making the whole optimization scheme brittle and susceptible to instability. The key distinction of ACRE lies in the way of handling and storing both extrinsic and intrinsic rewards. ACRE is an off-policy, actor-critic style RL algorithm that separately approximates the forward novelty return. ACRE is shipped with a Gaussian mixture model to calculate the instantaneous novelty; however, different options could also be integrated. Using such an effective early exploration, ACRE results in substantial improvements over alternative RL methods, in a range of continuous control RL environments, such as learning from policy-misleading reward signals. Open-source implementation is available here: https://github.com/athakapo/ACRE .
doi_str_mv	10.1007/s00521-023-08845-x
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2865417010</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2865417010</sourcerecordid><originalsourceid>FETCH-LOGICAL-c363t-7a04c62f38c0b62889e627a0219377cde532c34f752cb5813389be9e191540cf3</originalsourceid><addsrcrecordid>eNp9kE1LAzEQhoMoWKt_wNOC5-gkk2Sz3spSP6CgFD2HbZqtW-punWy1_nujK3jzNDDzvO_Aw9i5gEsBkF9FAC0FB4kcrFWa7w_YSChEjqDtIRtBodLZKDxmJzGuAUAZq0dMT8r59Dqb-L4jXlLTNz77aPqXbB4-KlryRwox0HvTrrLpfrvpqOqbrj1lR3W1ieHsd47Z8830qbzjs4fb-3Iy4x4N9jyvQHkja7QeFkZaWwQj01KKAvPcL4NG6VHVuZZ-oa1AtMUiFEEUQivwNY7ZxdC7pe5tF2Lv1t2O2vTSSWu0EjkISJQcKE9djBRqt6XmtaJPJ8B963GDHpf0uB89bp9COIRigttVoL_qf1JfOo1mBQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2865417010</pqid></control><display><type>article</type><title>ACRE: Actor-Critic with Reward-Preserving Exploration</title><source>SpringerLink Journals - AutoHoldings</source><creator>Kapoutsis, Athanasios Ch ; Koutras, Dimitrios I. ; Korkas, Christos D. ; Kosmatopoulos, Elias B.</creator><creatorcontrib>Kapoutsis, Athanasios Ch ; Koutras, Dimitrios I. ; Korkas, Christos D. ; Kosmatopoulos, Elias B.</creatorcontrib><description>While reinforcement learning (RL) algorithms have generated impressive strategies for a wide range of tasks, the performance improvements in continuous-domain, real-world problems do not follow the same trend. Poor exploration and quick convergence to locally optimal solutions play a dominant role. Advanced RL algorithms attempt to mitigate this issue by introducing exploration signals during the training procedure. This successful integration has paved the way to introduce signals from the intrinsic exploration branch. ACRE algorithm is a framework that concretely describes the conditions for such an integration, avoiding transforming the Markov decision process into time varying, and as a result, making the whole optimization scheme brittle and susceptible to instability. The key distinction of ACRE lies in the way of handling and storing both extrinsic and intrinsic rewards. ACRE is an off-policy, actor-critic style RL algorithm that separately approximates the forward novelty return. ACRE is shipped with a Gaussian mixture model to calculate the instantaneous novelty; however, different options could also be integrated. Using such an effective early exploration, ACRE results in substantial improvements over alternative RL methods, in a range of continuous control RL environments, such as learning from policy-misleading reward signals. Open-source implementation is available here: https://github.com/athakapo/ACRE .</description><identifier>ISSN: 0941-0643</identifier><identifier>EISSN: 1433-3058</identifier><identifier>DOI: 10.1007/s00521-023-08845-x</identifier><language>eng</language><publisher>London: Springer London</publisher><subject>Algorithms ; Artificial Intelligence ; Computational Biology/Bioinformatics ; Computational Science and Engineering ; Computer Science ; Data Mining and Knowledge Discovery ; Image Processing and Computer Vision ; Machine learning ; Markov processes ; Optimization ; Original Article ; Probabilistic models ; Probability and Statistics in Computer Science</subject><ispartof>Neural computing & applications, 2023-10, Vol.35 (30), p.22563-22576</ispartof><rights>The Author(s) 2023</rights><rights>The Author(s) 2023. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c363t-7a04c62f38c0b62889e627a0219377cde532c34f752cb5813389be9e191540cf3</citedby><cites>FETCH-LOGICAL-c363t-7a04c62f38c0b62889e627a0219377cde532c34f752cb5813389be9e191540cf3</cites><orcidid>0000-0002-1688-036X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s00521-023-08845-x$$EPDF$$P50$$Gspringer$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s00521-023-08845-x$$EHTML$$P50$$Gspringer$$Hfree_for_read</linktohtml><link.rule.ids>314,776,780,27901,27902,41464,42533,51294</link.rule.ids></links><search><creatorcontrib>Kapoutsis, Athanasios Ch</creatorcontrib><creatorcontrib>Koutras, Dimitrios I.</creatorcontrib><creatorcontrib>Korkas, Christos D.</creatorcontrib><creatorcontrib>Kosmatopoulos, Elias B.</creatorcontrib><title>ACRE: Actor-Critic with Reward-Preserving Exploration</title><title>Neural computing & applications</title><addtitle>Neural Comput & Applic</addtitle><description>While reinforcement learning (RL) algorithms have generated impressive strategies for a wide range of tasks, the performance improvements in continuous-domain, real-world problems do not follow the same trend. Poor exploration and quick convergence to locally optimal solutions play a dominant role. Advanced RL algorithms attempt to mitigate this issue by introducing exploration signals during the training procedure. This successful integration has paved the way to introduce signals from the intrinsic exploration branch. ACRE algorithm is a framework that concretely describes the conditions for such an integration, avoiding transforming the Markov decision process into time varying, and as a result, making the whole optimization scheme brittle and susceptible to instability. The key distinction of ACRE lies in the way of handling and storing both extrinsic and intrinsic rewards. ACRE is an off-policy, actor-critic style RL algorithm that separately approximates the forward novelty return. ACRE is shipped with a Gaussian mixture model to calculate the instantaneous novelty; however, different options could also be integrated. Using such an effective early exploration, ACRE results in substantial improvements over alternative RL methods, in a range of continuous control RL environments, such as learning from policy-misleading reward signals. Open-source implementation is available here: https://github.com/athakapo/ACRE .</description><subject>Algorithms</subject><subject>Artificial Intelligence</subject><subject>Computational Biology/Bioinformatics</subject><subject>Computational Science and Engineering</subject><subject>Computer Science</subject><subject>Data Mining and Knowledge Discovery</subject><subject>Image Processing and Computer Vision</subject><subject>Machine learning</subject><subject>Markov processes</subject><subject>Optimization</subject><subject>Original Article</subject><subject>Probabilistic models</subject><subject>Probability and Statistics in Computer Science</subject><issn>0941-0643</issn><issn>1433-3058</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>C6C</sourceid><sourceid>BENPR</sourceid><recordid>eNp9kE1LAzEQhoMoWKt_wNOC5-gkk2Sz3spSP6CgFD2HbZqtW-punWy1_nujK3jzNDDzvO_Aw9i5gEsBkF9FAC0FB4kcrFWa7w_YSChEjqDtIRtBodLZKDxmJzGuAUAZq0dMT8r59Dqb-L4jXlLTNz77aPqXbB4-KlryRwox0HvTrrLpfrvpqOqbrj1lR3W1ieHsd47Z8830qbzjs4fb-3Iy4x4N9jyvQHkja7QeFkZaWwQj01KKAvPcL4NG6VHVuZZ-oa1AtMUiFEEUQivwNY7ZxdC7pe5tF2Lv1t2O2vTSSWu0EjkISJQcKE9djBRqt6XmtaJPJ8B963GDHpf0uB89bp9COIRigttVoL_qf1JfOo1mBQ</recordid><startdate>20231001</startdate><enddate>20231001</enddate><creator>Kapoutsis, Athanasios Ch</creator><creator>Koutras, Dimitrios I.</creator><creator>Korkas, Christos D.</creator><creator>Kosmatopoulos, Elias B.</creator><general>Springer London</general><general>Springer Nature B.V</general><scope>C6C</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>8FE</scope><scope>8FG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>P5Z</scope><scope>P62</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><orcidid>https://orcid.org/0000-0002-1688-036X</orcidid></search><sort><creationdate>20231001</creationdate><title>ACRE: Actor-Critic with Reward-Preserving Exploration</title><author>Kapoutsis, Athanasios Ch ; Koutras, Dimitrios I. ; Korkas, Christos D. ; Kosmatopoulos, Elias B.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c363t-7a04c62f38c0b62889e627a0219377cde532c34f752cb5813389be9e191540cf3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Algorithms</topic><topic>Artificial Intelligence</topic><topic>Computational Biology/Bioinformatics</topic><topic>Computational Science and Engineering</topic><topic>Computer Science</topic><topic>Data Mining and Knowledge Discovery</topic><topic>Image Processing and Computer Vision</topic><topic>Machine learning</topic><topic>Markov processes</topic><topic>Optimization</topic><topic>Original Article</topic><topic>Probabilistic models</topic><topic>Probability and Statistics in Computer Science</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Kapoutsis, Athanasios Ch</creatorcontrib><creatorcontrib>Koutras, Dimitrios I.</creatorcontrib><creatorcontrib>Korkas, Christos D.</creatorcontrib><creatorcontrib>Kosmatopoulos, Elias B.</creatorcontrib><collection>Springer Nature OA Free Journals</collection><collection>CrossRef</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><jtitle>Neural computing & applications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Kapoutsis, Athanasios Ch</au><au>Koutras, Dimitrios I.</au><au>Korkas, Christos D.</au><au>Kosmatopoulos, Elias B.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>ACRE: Actor-Critic with Reward-Preserving Exploration</atitle><jtitle>Neural computing & applications</jtitle><stitle>Neural Comput & Applic</stitle><date>2023-10-01</date><risdate>2023</risdate><volume>35</volume><issue>30</issue><spage>22563</spage><epage>22576</epage><pages>22563-22576</pages><issn>0941-0643</issn><eissn>1433-3058</eissn><abstract>While reinforcement learning (RL) algorithms have generated impressive strategies for a wide range of tasks, the performance improvements in continuous-domain, real-world problems do not follow the same trend. Poor exploration and quick convergence to locally optimal solutions play a dominant role. Advanced RL algorithms attempt to mitigate this issue by introducing exploration signals during the training procedure. This successful integration has paved the way to introduce signals from the intrinsic exploration branch. ACRE algorithm is a framework that concretely describes the conditions for such an integration, avoiding transforming the Markov decision process into time varying, and as a result, making the whole optimization scheme brittle and susceptible to instability. The key distinction of ACRE lies in the way of handling and storing both extrinsic and intrinsic rewards. ACRE is an off-policy, actor-critic style RL algorithm that separately approximates the forward novelty return. ACRE is shipped with a Gaussian mixture model to calculate the instantaneous novelty; however, different options could also be integrated. Using such an effective early exploration, ACRE results in substantial improvements over alternative RL methods, in a range of continuous control RL environments, such as learning from policy-misleading reward signals. Open-source implementation is available here: https://github.com/athakapo/ACRE .</abstract><cop>London</cop><pub>Springer London</pub><doi>10.1007/s00521-023-08845-x</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0002-1688-036X</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 0941-0643
ispartof	Neural computing & applications, 2023-10, Vol.35 (30), p.22563-22576
issn	0941-0643 1433-3058
language	eng
recordid	cdi_proquest_journals_2865417010
source	SpringerLink Journals - AutoHoldings
subjects	Algorithms Artificial Intelligence Computational Biology/Bioinformatics Computational Science and Engineering Computer Science Data Mining and Knowledge Discovery Image Processing and Computer Vision Machine learning Markov processes Optimization Original Article Probabilistic models Probability and Statistics in Computer Science
title	ACRE: Actor-Critic with Reward-Preserving Exploration
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-28T16%3A57%3A12IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=ACRE:%20Actor-Critic%20with%20Reward-Preserving%20Exploration&rft.jtitle=Neural%20computing%20&%20applications&rft.au=Kapoutsis,%20Athanasios%20Ch&rft.date=2023-10-01&rft.volume=35&rft.issue=30&rft.spage=22563&rft.epage=22576&rft.pages=22563-22576&rft.issn=0941-0643&rft.eissn=1433-3058&rft_id=info:doi/10.1007/s00521-023-08845-x&rft_dat=%3Cproquest_cross%3E2865417010%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2865417010&rft_id=info:pmid/&rfr_iscdi=true