ACRE: Actor-Critic with Reward-Preserving Exploration
While reinforcement learning (RL) algorithms have generated impressive strategies for a wide range of tasks, the performance improvements in continuous-domain, real-world problems do not follow the same trend. Poor exploration and quick convergence to locally optimal solutions play a dominant role....
Gespeichert in:
Veröffentlicht in: | Neural computing & applications 2023-10, Vol.35 (30), p.22563-22576 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 22576 |
---|---|
container_issue | 30 |
container_start_page | 22563 |
container_title | Neural computing & applications |
container_volume | 35 |
creator | Kapoutsis, Athanasios Ch Koutras, Dimitrios I. Korkas, Christos D. Kosmatopoulos, Elias B. |
description | While reinforcement learning (RL) algorithms have generated impressive strategies for a wide range of tasks, the performance improvements in continuous-domain, real-world problems do not follow the same trend. Poor exploration and quick convergence to locally optimal solutions play a dominant role. Advanced RL algorithms attempt to mitigate this issue by introducing exploration signals during the training procedure. This successful integration has paved the way to introduce signals from the intrinsic exploration branch. ACRE algorithm is a framework that concretely describes the conditions for such an integration, avoiding transforming the Markov decision process into time varying, and as a result, making the whole optimization scheme brittle and susceptible to instability. The key distinction of ACRE lies in the way of handling and storing both extrinsic and intrinsic rewards. ACRE is an off-policy, actor-critic style RL algorithm that separately approximates the forward novelty return. ACRE is shipped with a Gaussian mixture model to calculate the instantaneous novelty; however, different options could also be integrated. Using such an effective early exploration, ACRE results in substantial improvements over alternative RL methods, in a range of continuous control RL environments, such as learning from policy-misleading reward signals. Open-source implementation is available here:
https://github.com/athakapo/ACRE
. |
doi_str_mv | 10.1007/s00521-023-08845-x |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2865417010</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2865417010</sourcerecordid><originalsourceid>FETCH-LOGICAL-c363t-7a04c62f38c0b62889e627a0219377cde532c34f752cb5813389be9e191540cf3</originalsourceid><addsrcrecordid>eNp9kE1LAzEQhoMoWKt_wNOC5-gkk2Sz3spSP6CgFD2HbZqtW-punWy1_nujK3jzNDDzvO_Aw9i5gEsBkF9FAC0FB4kcrFWa7w_YSChEjqDtIRtBodLZKDxmJzGuAUAZq0dMT8r59Dqb-L4jXlLTNz77aPqXbB4-KlryRwox0HvTrrLpfrvpqOqbrj1lR3W1ieHsd47Z8830qbzjs4fb-3Iy4x4N9jyvQHkja7QeFkZaWwQj01KKAvPcL4NG6VHVuZZ-oa1AtMUiFEEUQivwNY7ZxdC7pe5tF2Lv1t2O2vTSSWu0EjkISJQcKE9djBRqt6XmtaJPJ8B963GDHpf0uB89bp9COIRigttVoL_qf1JfOo1mBQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2865417010</pqid></control><display><type>article</type><title>ACRE: Actor-Critic with Reward-Preserving Exploration</title><source>SpringerLink Journals - AutoHoldings</source><creator>Kapoutsis, Athanasios Ch ; Koutras, Dimitrios I. ; Korkas, Christos D. ; Kosmatopoulos, Elias B.</creator><creatorcontrib>Kapoutsis, Athanasios Ch ; Koutras, Dimitrios I. ; Korkas, Christos D. ; Kosmatopoulos, Elias B.</creatorcontrib><description>While reinforcement learning (RL) algorithms have generated impressive strategies for a wide range of tasks, the performance improvements in continuous-domain, real-world problems do not follow the same trend. Poor exploration and quick convergence to locally optimal solutions play a dominant role. Advanced RL algorithms attempt to mitigate this issue by introducing exploration signals during the training procedure. This successful integration has paved the way to introduce signals from the intrinsic exploration branch. ACRE algorithm is a framework that concretely describes the conditions for such an integration, avoiding transforming the Markov decision process into time varying, and as a result, making the whole optimization scheme brittle and susceptible to instability. The key distinction of ACRE lies in the way of handling and storing both extrinsic and intrinsic rewards. ACRE is an off-policy, actor-critic style RL algorithm that separately approximates the forward novelty return. ACRE is shipped with a Gaussian mixture model to calculate the instantaneous novelty; however, different options could also be integrated. Using such an effective early exploration, ACRE results in substantial improvements over alternative RL methods, in a range of continuous control RL environments, such as learning from policy-misleading reward signals. Open-source implementation is available here:
https://github.com/athakapo/ACRE
.</description><identifier>ISSN: 0941-0643</identifier><identifier>EISSN: 1433-3058</identifier><identifier>DOI: 10.1007/s00521-023-08845-x</identifier><language>eng</language><publisher>London: Springer London</publisher><subject>Algorithms ; Artificial Intelligence ; Computational Biology/Bioinformatics ; Computational Science and Engineering ; Computer Science ; Data Mining and Knowledge Discovery ; Image Processing and Computer Vision ; Machine learning ; Markov processes ; Optimization ; Original Article ; Probabilistic models ; Probability and Statistics in Computer Science</subject><ispartof>Neural computing & applications, 2023-10, Vol.35 (30), p.22563-22576</ispartof><rights>The Author(s) 2023</rights><rights>The Author(s) 2023. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c363t-7a04c62f38c0b62889e627a0219377cde532c34f752cb5813389be9e191540cf3</citedby><cites>FETCH-LOGICAL-c363t-7a04c62f38c0b62889e627a0219377cde532c34f752cb5813389be9e191540cf3</cites><orcidid>0000-0002-1688-036X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s00521-023-08845-x$$EPDF$$P50$$Gspringer$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s00521-023-08845-x$$EHTML$$P50$$Gspringer$$Hfree_for_read</linktohtml><link.rule.ids>314,776,780,27901,27902,41464,42533,51294</link.rule.ids></links><search><creatorcontrib>Kapoutsis, Athanasios Ch</creatorcontrib><creatorcontrib>Koutras, Dimitrios I.</creatorcontrib><creatorcontrib>Korkas, Christos D.</creatorcontrib><creatorcontrib>Kosmatopoulos, Elias B.</creatorcontrib><title>ACRE: Actor-Critic with Reward-Preserving Exploration</title><title>Neural computing & applications</title><addtitle>Neural Comput & Applic</addtitle><description>While reinforcement learning (RL) algorithms have generated impressive strategies for a wide range of tasks, the performance improvements in continuous-domain, real-world problems do not follow the same trend. Poor exploration and quick convergence to locally optimal solutions play a dominant role. Advanced RL algorithms attempt to mitigate this issue by introducing exploration signals during the training procedure. This successful integration has paved the way to introduce signals from the intrinsic exploration branch. ACRE algorithm is a framework that concretely describes the conditions for such an integration, avoiding transforming the Markov decision process into time varying, and as a result, making the whole optimization scheme brittle and susceptible to instability. The key distinction of ACRE lies in the way of handling and storing both extrinsic and intrinsic rewards. ACRE is an off-policy, actor-critic style RL algorithm that separately approximates the forward novelty return. ACRE is shipped with a Gaussian mixture model to calculate the instantaneous novelty; however, different options could also be integrated. Using such an effective early exploration, ACRE results in substantial improvements over alternative RL methods, in a range of continuous control RL environments, such as learning from policy-misleading reward signals. Open-source implementation is available here:
https://github.com/athakapo/ACRE
.</description><subject>Algorithms</subject><subject>Artificial Intelligence</subject><subject>Computational Biology/Bioinformatics</subject><subject>Computational Science and Engineering</subject><subject>Computer Science</subject><subject>Data Mining and Knowledge Discovery</subject><subject>Image Processing and Computer Vision</subject><subject>Machine learning</subject><subject>Markov processes</subject><subject>Optimization</subject><subject>Original Article</subject><subject>Probabilistic models</subject><subject>Probability and Statistics in Computer Science</subject><issn>0941-0643</issn><issn>1433-3058</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>C6C</sourceid><sourceid>BENPR</sourceid><recordid>eNp9kE1LAzEQhoMoWKt_wNOC5-gkk2Sz3spSP6CgFD2HbZqtW-punWy1_nujK3jzNDDzvO_Aw9i5gEsBkF9FAC0FB4kcrFWa7w_YSChEjqDtIRtBodLZKDxmJzGuAUAZq0dMT8r59Dqb-L4jXlLTNz77aPqXbB4-KlryRwox0HvTrrLpfrvpqOqbrj1lR3W1ieHsd47Z8830qbzjs4fb-3Iy4x4N9jyvQHkja7QeFkZaWwQj01KKAvPcL4NG6VHVuZZ-oa1AtMUiFEEUQivwNY7ZxdC7pe5tF2Lv1t2O2vTSSWu0EjkISJQcKE9djBRqt6XmtaJPJ8B963GDHpf0uB89bp9COIRigttVoL_qf1JfOo1mBQ</recordid><startdate>20231001</startdate><enddate>20231001</enddate><creator>Kapoutsis, Athanasios Ch</creator><creator>Koutras, Dimitrios I.</creator><creator>Korkas, Christos D.</creator><creator>Kosmatopoulos, Elias B.</creator><general>Springer London</general><general>Springer Nature B.V</general><scope>C6C</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>8FE</scope><scope>8FG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>P5Z</scope><scope>P62</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><orcidid>https://orcid.org/0000-0002-1688-036X</orcidid></search><sort><creationdate>20231001</creationdate><title>ACRE: Actor-Critic with Reward-Preserving Exploration</title><author>Kapoutsis, Athanasios Ch ; Koutras, Dimitrios I. ; Korkas, Christos D. ; Kosmatopoulos, Elias B.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c363t-7a04c62f38c0b62889e627a0219377cde532c34f752cb5813389be9e191540cf3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Algorithms</topic><topic>Artificial Intelligence</topic><topic>Computational Biology/Bioinformatics</topic><topic>Computational Science and Engineering</topic><topic>Computer Science</topic><topic>Data Mining and Knowledge Discovery</topic><topic>Image Processing and Computer Vision</topic><topic>Machine learning</topic><topic>Markov processes</topic><topic>Optimization</topic><topic>Original Article</topic><topic>Probabilistic models</topic><topic>Probability and Statistics in Computer Science</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Kapoutsis, Athanasios Ch</creatorcontrib><creatorcontrib>Koutras, Dimitrios I.</creatorcontrib><creatorcontrib>Korkas, Christos D.</creatorcontrib><creatorcontrib>Kosmatopoulos, Elias B.</creatorcontrib><collection>Springer Nature OA Free Journals</collection><collection>CrossRef</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><jtitle>Neural computing & applications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Kapoutsis, Athanasios Ch</au><au>Koutras, Dimitrios I.</au><au>Korkas, Christos D.</au><au>Kosmatopoulos, Elias B.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>ACRE: Actor-Critic with Reward-Preserving Exploration</atitle><jtitle>Neural computing & applications</jtitle><stitle>Neural Comput & Applic</stitle><date>2023-10-01</date><risdate>2023</risdate><volume>35</volume><issue>30</issue><spage>22563</spage><epage>22576</epage><pages>22563-22576</pages><issn>0941-0643</issn><eissn>1433-3058</eissn><abstract>While reinforcement learning (RL) algorithms have generated impressive strategies for a wide range of tasks, the performance improvements in continuous-domain, real-world problems do not follow the same trend. Poor exploration and quick convergence to locally optimal solutions play a dominant role. Advanced RL algorithms attempt to mitigate this issue by introducing exploration signals during the training procedure. This successful integration has paved the way to introduce signals from the intrinsic exploration branch. ACRE algorithm is a framework that concretely describes the conditions for such an integration, avoiding transforming the Markov decision process into time varying, and as a result, making the whole optimization scheme brittle and susceptible to instability. The key distinction of ACRE lies in the way of handling and storing both extrinsic and intrinsic rewards. ACRE is an off-policy, actor-critic style RL algorithm that separately approximates the forward novelty return. ACRE is shipped with a Gaussian mixture model to calculate the instantaneous novelty; however, different options could also be integrated. Using such an effective early exploration, ACRE results in substantial improvements over alternative RL methods, in a range of continuous control RL environments, such as learning from policy-misleading reward signals. Open-source implementation is available here:
https://github.com/athakapo/ACRE
.</abstract><cop>London</cop><pub>Springer London</pub><doi>10.1007/s00521-023-08845-x</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0002-1688-036X</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0941-0643 |
ispartof | Neural computing & applications, 2023-10, Vol.35 (30), p.22563-22576 |
issn | 0941-0643 1433-3058 |
language | eng |
recordid | cdi_proquest_journals_2865417010 |
source | SpringerLink Journals - AutoHoldings |
subjects | Algorithms Artificial Intelligence Computational Biology/Bioinformatics Computational Science and Engineering Computer Science Data Mining and Knowledge Discovery Image Processing and Computer Vision Machine learning Markov processes Optimization Original Article Probabilistic models Probability and Statistics in Computer Science |
title | ACRE: Actor-Critic with Reward-Preserving Exploration |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-28T16%3A57%3A12IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=ACRE:%20Actor-Critic%20with%20Reward-Preserving%20Exploration&rft.jtitle=Neural%20computing%20&%20applications&rft.au=Kapoutsis,%20Athanasios%20Ch&rft.date=2023-10-01&rft.volume=35&rft.issue=30&rft.spage=22563&rft.epage=22576&rft.pages=22563-22576&rft.issn=0941-0643&rft.eissn=1433-3058&rft_id=info:doi/10.1007/s00521-023-08845-x&rft_dat=%3Cproquest_cross%3E2865417010%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2865417010&rft_id=info:pmid/&rfr_iscdi=true |