ACRE: Actor-Critic with Reward-Preserving Exploration

While reinforcement learning (RL) algorithms have generated impressive strategies for a wide range of tasks, the performance improvements in continuous-domain, real-world problems do not follow the same trend. Poor exploration and quick convergence to locally optimal solutions play a dominant role....

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Neural computing & applications 2023-10, Vol.35 (30), p.22563-22576
Hauptverfasser: Kapoutsis, Athanasios Ch, Koutras, Dimitrios I., Korkas, Christos D., Kosmatopoulos, Elias B.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 22576
container_issue 30
container_start_page 22563
container_title Neural computing & applications
container_volume 35
creator Kapoutsis, Athanasios Ch
Koutras, Dimitrios I.
Korkas, Christos D.
Kosmatopoulos, Elias B.
description While reinforcement learning (RL) algorithms have generated impressive strategies for a wide range of tasks, the performance improvements in continuous-domain, real-world problems do not follow the same trend. Poor exploration and quick convergence to locally optimal solutions play a dominant role. Advanced RL algorithms attempt to mitigate this issue by introducing exploration signals during the training procedure. This successful integration has paved the way to introduce signals from the intrinsic exploration branch. ACRE algorithm is a framework that concretely describes the conditions for such an integration, avoiding transforming the Markov decision process into time varying, and as a result, making the whole optimization scheme brittle and susceptible to instability. The key distinction of ACRE lies in the way of handling and storing both extrinsic and intrinsic rewards. ACRE is an off-policy, actor-critic style RL algorithm that separately approximates the forward novelty return. ACRE is shipped with a Gaussian mixture model to calculate the instantaneous novelty; however, different options could also be integrated. Using such an effective early exploration, ACRE results in substantial improvements over alternative RL methods, in a range of continuous control RL environments, such as learning from policy-misleading reward signals. Open-source implementation is available here: https://github.com/athakapo/ACRE .
doi_str_mv 10.1007/s00521-023-08845-x
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2865417010</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2865417010</sourcerecordid><originalsourceid>FETCH-LOGICAL-c363t-7a04c62f38c0b62889e627a0219377cde532c34f752cb5813389be9e191540cf3</originalsourceid><addsrcrecordid>eNp9kE1LAzEQhoMoWKt_wNOC5-gkk2Sz3spSP6CgFD2HbZqtW-punWy1_nujK3jzNDDzvO_Aw9i5gEsBkF9FAC0FB4kcrFWa7w_YSChEjqDtIRtBodLZKDxmJzGuAUAZq0dMT8r59Dqb-L4jXlLTNz77aPqXbB4-KlryRwox0HvTrrLpfrvpqOqbrj1lR3W1ieHsd47Z8830qbzjs4fb-3Iy4x4N9jyvQHkja7QeFkZaWwQj01KKAvPcL4NG6VHVuZZ-oa1AtMUiFEEUQivwNY7ZxdC7pe5tF2Lv1t2O2vTSSWu0EjkISJQcKE9djBRqt6XmtaJPJ8B963GDHpf0uB89bp9COIRigttVoL_qf1JfOo1mBQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2865417010</pqid></control><display><type>article</type><title>ACRE: Actor-Critic with Reward-Preserving Exploration</title><source>SpringerLink Journals - AutoHoldings</source><creator>Kapoutsis, Athanasios Ch ; Koutras, Dimitrios I. ; Korkas, Christos D. ; Kosmatopoulos, Elias B.</creator><creatorcontrib>Kapoutsis, Athanasios Ch ; Koutras, Dimitrios I. ; Korkas, Christos D. ; Kosmatopoulos, Elias B.</creatorcontrib><description>While reinforcement learning (RL) algorithms have generated impressive strategies for a wide range of tasks, the performance improvements in continuous-domain, real-world problems do not follow the same trend. Poor exploration and quick convergence to locally optimal solutions play a dominant role. Advanced RL algorithms attempt to mitigate this issue by introducing exploration signals during the training procedure. This successful integration has paved the way to introduce signals from the intrinsic exploration branch. ACRE algorithm is a framework that concretely describes the conditions for such an integration, avoiding transforming the Markov decision process into time varying, and as a result, making the whole optimization scheme brittle and susceptible to instability. The key distinction of ACRE lies in the way of handling and storing both extrinsic and intrinsic rewards. ACRE is an off-policy, actor-critic style RL algorithm that separately approximates the forward novelty return. ACRE is shipped with a Gaussian mixture model to calculate the instantaneous novelty; however, different options could also be integrated. Using such an effective early exploration, ACRE results in substantial improvements over alternative RL methods, in a range of continuous control RL environments, such as learning from policy-misleading reward signals. Open-source implementation is available here: https://github.com/athakapo/ACRE .</description><identifier>ISSN: 0941-0643</identifier><identifier>EISSN: 1433-3058</identifier><identifier>DOI: 10.1007/s00521-023-08845-x</identifier><language>eng</language><publisher>London: Springer London</publisher><subject>Algorithms ; Artificial Intelligence ; Computational Biology/Bioinformatics ; Computational Science and Engineering ; Computer Science ; Data Mining and Knowledge Discovery ; Image Processing and Computer Vision ; Machine learning ; Markov processes ; Optimization ; Original Article ; Probabilistic models ; Probability and Statistics in Computer Science</subject><ispartof>Neural computing &amp; applications, 2023-10, Vol.35 (30), p.22563-22576</ispartof><rights>The Author(s) 2023</rights><rights>The Author(s) 2023. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c363t-7a04c62f38c0b62889e627a0219377cde532c34f752cb5813389be9e191540cf3</citedby><cites>FETCH-LOGICAL-c363t-7a04c62f38c0b62889e627a0219377cde532c34f752cb5813389be9e191540cf3</cites><orcidid>0000-0002-1688-036X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s00521-023-08845-x$$EPDF$$P50$$Gspringer$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s00521-023-08845-x$$EHTML$$P50$$Gspringer$$Hfree_for_read</linktohtml><link.rule.ids>314,776,780,27901,27902,41464,42533,51294</link.rule.ids></links><search><creatorcontrib>Kapoutsis, Athanasios Ch</creatorcontrib><creatorcontrib>Koutras, Dimitrios I.</creatorcontrib><creatorcontrib>Korkas, Christos D.</creatorcontrib><creatorcontrib>Kosmatopoulos, Elias B.</creatorcontrib><title>ACRE: Actor-Critic with Reward-Preserving Exploration</title><title>Neural computing &amp; applications</title><addtitle>Neural Comput &amp; Applic</addtitle><description>While reinforcement learning (RL) algorithms have generated impressive strategies for a wide range of tasks, the performance improvements in continuous-domain, real-world problems do not follow the same trend. Poor exploration and quick convergence to locally optimal solutions play a dominant role. Advanced RL algorithms attempt to mitigate this issue by introducing exploration signals during the training procedure. This successful integration has paved the way to introduce signals from the intrinsic exploration branch. ACRE algorithm is a framework that concretely describes the conditions for such an integration, avoiding transforming the Markov decision process into time varying, and as a result, making the whole optimization scheme brittle and susceptible to instability. The key distinction of ACRE lies in the way of handling and storing both extrinsic and intrinsic rewards. ACRE is an off-policy, actor-critic style RL algorithm that separately approximates the forward novelty return. ACRE is shipped with a Gaussian mixture model to calculate the instantaneous novelty; however, different options could also be integrated. Using such an effective early exploration, ACRE results in substantial improvements over alternative RL methods, in a range of continuous control RL environments, such as learning from policy-misleading reward signals. Open-source implementation is available here: https://github.com/athakapo/ACRE .</description><subject>Algorithms</subject><subject>Artificial Intelligence</subject><subject>Computational Biology/Bioinformatics</subject><subject>Computational Science and Engineering</subject><subject>Computer Science</subject><subject>Data Mining and Knowledge Discovery</subject><subject>Image Processing and Computer Vision</subject><subject>Machine learning</subject><subject>Markov processes</subject><subject>Optimization</subject><subject>Original Article</subject><subject>Probabilistic models</subject><subject>Probability and Statistics in Computer Science</subject><issn>0941-0643</issn><issn>1433-3058</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>C6C</sourceid><sourceid>BENPR</sourceid><recordid>eNp9kE1LAzEQhoMoWKt_wNOC5-gkk2Sz3spSP6CgFD2HbZqtW-punWy1_nujK3jzNDDzvO_Aw9i5gEsBkF9FAC0FB4kcrFWa7w_YSChEjqDtIRtBodLZKDxmJzGuAUAZq0dMT8r59Dqb-L4jXlLTNz77aPqXbB4-KlryRwox0HvTrrLpfrvpqOqbrj1lR3W1ieHsd47Z8830qbzjs4fb-3Iy4x4N9jyvQHkja7QeFkZaWwQj01KKAvPcL4NG6VHVuZZ-oa1AtMUiFEEUQivwNY7ZxdC7pe5tF2Lv1t2O2vTSSWu0EjkISJQcKE9djBRqt6XmtaJPJ8B963GDHpf0uB89bp9COIRigttVoL_qf1JfOo1mBQ</recordid><startdate>20231001</startdate><enddate>20231001</enddate><creator>Kapoutsis, Athanasios Ch</creator><creator>Koutras, Dimitrios I.</creator><creator>Korkas, Christos D.</creator><creator>Kosmatopoulos, Elias B.</creator><general>Springer London</general><general>Springer Nature B.V</general><scope>C6C</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>8FE</scope><scope>8FG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>P5Z</scope><scope>P62</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><orcidid>https://orcid.org/0000-0002-1688-036X</orcidid></search><sort><creationdate>20231001</creationdate><title>ACRE: Actor-Critic with Reward-Preserving Exploration</title><author>Kapoutsis, Athanasios Ch ; Koutras, Dimitrios I. ; Korkas, Christos D. ; Kosmatopoulos, Elias B.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c363t-7a04c62f38c0b62889e627a0219377cde532c34f752cb5813389be9e191540cf3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Algorithms</topic><topic>Artificial Intelligence</topic><topic>Computational Biology/Bioinformatics</topic><topic>Computational Science and Engineering</topic><topic>Computer Science</topic><topic>Data Mining and Knowledge Discovery</topic><topic>Image Processing and Computer Vision</topic><topic>Machine learning</topic><topic>Markov processes</topic><topic>Optimization</topic><topic>Original Article</topic><topic>Probabilistic models</topic><topic>Probability and Statistics in Computer Science</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Kapoutsis, Athanasios Ch</creatorcontrib><creatorcontrib>Koutras, Dimitrios I.</creatorcontrib><creatorcontrib>Korkas, Christos D.</creatorcontrib><creatorcontrib>Kosmatopoulos, Elias B.</creatorcontrib><collection>Springer Nature OA Free Journals</collection><collection>CrossRef</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>Advanced Technologies &amp; Aerospace Database</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><jtitle>Neural computing &amp; applications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Kapoutsis, Athanasios Ch</au><au>Koutras, Dimitrios I.</au><au>Korkas, Christos D.</au><au>Kosmatopoulos, Elias B.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>ACRE: Actor-Critic with Reward-Preserving Exploration</atitle><jtitle>Neural computing &amp; applications</jtitle><stitle>Neural Comput &amp; Applic</stitle><date>2023-10-01</date><risdate>2023</risdate><volume>35</volume><issue>30</issue><spage>22563</spage><epage>22576</epage><pages>22563-22576</pages><issn>0941-0643</issn><eissn>1433-3058</eissn><abstract>While reinforcement learning (RL) algorithms have generated impressive strategies for a wide range of tasks, the performance improvements in continuous-domain, real-world problems do not follow the same trend. Poor exploration and quick convergence to locally optimal solutions play a dominant role. Advanced RL algorithms attempt to mitigate this issue by introducing exploration signals during the training procedure. This successful integration has paved the way to introduce signals from the intrinsic exploration branch. ACRE algorithm is a framework that concretely describes the conditions for such an integration, avoiding transforming the Markov decision process into time varying, and as a result, making the whole optimization scheme brittle and susceptible to instability. The key distinction of ACRE lies in the way of handling and storing both extrinsic and intrinsic rewards. ACRE is an off-policy, actor-critic style RL algorithm that separately approximates the forward novelty return. ACRE is shipped with a Gaussian mixture model to calculate the instantaneous novelty; however, different options could also be integrated. Using such an effective early exploration, ACRE results in substantial improvements over alternative RL methods, in a range of continuous control RL environments, such as learning from policy-misleading reward signals. Open-source implementation is available here: https://github.com/athakapo/ACRE .</abstract><cop>London</cop><pub>Springer London</pub><doi>10.1007/s00521-023-08845-x</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0002-1688-036X</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0941-0643
ispartof Neural computing & applications, 2023-10, Vol.35 (30), p.22563-22576
issn 0941-0643
1433-3058
language eng
recordid cdi_proquest_journals_2865417010
source SpringerLink Journals - AutoHoldings
subjects Algorithms
Artificial Intelligence
Computational Biology/Bioinformatics
Computational Science and Engineering
Computer Science
Data Mining and Knowledge Discovery
Image Processing and Computer Vision
Machine learning
Markov processes
Optimization
Original Article
Probabilistic models
Probability and Statistics in Computer Science
title ACRE: Actor-Critic with Reward-Preserving Exploration
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-28T16%3A57%3A12IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=ACRE:%20Actor-Critic%20with%20Reward-Preserving%20Exploration&rft.jtitle=Neural%20computing%20&%20applications&rft.au=Kapoutsis,%20Athanasios%20Ch&rft.date=2023-10-01&rft.volume=35&rft.issue=30&rft.spage=22563&rft.epage=22576&rft.pages=22563-22576&rft.issn=0941-0643&rft.eissn=1433-3058&rft_id=info:doi/10.1007/s00521-023-08845-x&rft_dat=%3Cproquest_cross%3E2865417010%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2865417010&rft_id=info:pmid/&rfr_iscdi=true