Mitigating Catastrophic Forgetting in Robot Continual Learning: A Guided Policy Search Approach Enhanced With Memory-Aware Synapses

Complex operational scenarios increasingly demand that industrial robots sequentially resolve multiple interrelated problems to accomplish complex operational tasks, necessitating robots to have the capacity for not only learning through interaction with the environment but also for continual learni...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE robotics and automation letters 2024-12, Vol.9 (12), p.11242-11249
Hauptverfasser: Dong, Qingwei, Zeng, Peng, He, Yunpeng, Wan, Guangxi, Dong, Xiaoting
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 11249
container_issue 12
container_start_page 11242
container_title IEEE robotics and automation letters
container_volume 9
creator Dong, Qingwei
Zeng, Peng
He, Yunpeng
Wan, Guangxi
Dong, Xiaoting
description Complex operational scenarios increasingly demand that industrial robots sequentially resolve multiple interrelated problems to accomplish complex operational tasks, necessitating robots to have the capacity for not only learning through interaction with the environment but also for continual learning. Current deep reinforcement learning methods have demonstrated substantial prowess in enabling robots to learn individual simple operational skills. However, catastrophic forgetting regarding the continual learning of various distinct tasks under a unified control policy remains a challenge. The lengthy sequential decision-making trajectory in reinforcement learning scenarios results in a massive state-action search space for the agent. Moreover, low-value state-action samples exacerbate the difficulty of continuous learning in reinforcement learning problems. In this letter, we propose a Continual Reinforcement Learning (CRL) method that accommodates the incremental multiskill learning demands of robots. We transform the tightly coupled structure in Guided Policy Search (GPS) algorithms, which closely intertwine local and global policies, into a loosely coupled structure. This revised structure updates the global policy only after the local policy for a specific task has converged, enabling online learning. In incrementally learning new tasks, the global policy is updated using hard parameter sharing and Memory Aware Synapses (MAS), creating task-specific layers while penalizing significant parameter changes in shared layers linked to prior tasks. This method reduces overfitting and mitigates catastrophic forgetting in robotic CRL. We validate our method on PR2, UR5 and Sawyer robots in simulators as well as on a real UR5 robot.
doi_str_mv 10.1109/LRA.2024.3487484
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_3126088780</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10737442</ieee_id><sourcerecordid>3126088780</sourcerecordid><originalsourceid>FETCH-LOGICAL-c175t-90ef9fae3fb59ac5bb7eafef4a9e7c5b05a059b6abed85bf8ec496c4892835763</originalsourceid><addsrcrecordid>eNpNUD1PwzAUjBBIIGBnYLDEnOLEdmyzRRUUpFYgPsQYvbgvrVGJg-0KdeaP41KGTu_e3b0PXZZdFHRUFFRfT5_rUUlLPmJcSa74QXZSMilzJqvqcA8fZ-chfFBKC1FKpsVJ9jOz0S4g2n5BxhAhRO-GpTXkzvkFxj_e9uTZtS6SsesTsYYVmSL4Pmk3pCaTtZ3jnDy5lTUb8pIUsyT1MHgHCdz2S-hN0t9tXJIZfjq_yetv8EheNj0MAcNZdtTBKuD5fz3N3u5uX8f3-fRx8jCup7kppIi5ptjpDpB1rdBgRNtKhA47DhplaqkAKnRbQYtzJdpOoeG6MlzpUjEhK3aaXe32pte-1hhi8-HWvk8nG1aUFVVKKppcdOcy3oXgsWsGbz_Bb5qCNtu0m5R2s027-U87jVzuRiwi7tklk5yX7BccH32E</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3126088780</pqid></control><display><type>article</type><title>Mitigating Catastrophic Forgetting in Robot Continual Learning: A Guided Policy Search Approach Enhanced With Memory-Aware Synapses</title><source>IEEE/IET Electronic Library (IEL)</source><creator>Dong, Qingwei ; Zeng, Peng ; He, Yunpeng ; Wan, Guangxi ; Dong, Xiaoting</creator><creatorcontrib>Dong, Qingwei ; Zeng, Peng ; He, Yunpeng ; Wan, Guangxi ; Dong, Xiaoting</creatorcontrib><description>Complex operational scenarios increasingly demand that industrial robots sequentially resolve multiple interrelated problems to accomplish complex operational tasks, necessitating robots to have the capacity for not only learning through interaction with the environment but also for continual learning. Current deep reinforcement learning methods have demonstrated substantial prowess in enabling robots to learn individual simple operational skills. However, catastrophic forgetting regarding the continual learning of various distinct tasks under a unified control policy remains a challenge. The lengthy sequential decision-making trajectory in reinforcement learning scenarios results in a massive state-action search space for the agent. Moreover, low-value state-action samples exacerbate the difficulty of continuous learning in reinforcement learning problems. In this letter, we propose a Continual Reinforcement Learning (CRL) method that accommodates the incremental multiskill learning demands of robots. We transform the tightly coupled structure in Guided Policy Search (GPS) algorithms, which closely intertwine local and global policies, into a loosely coupled structure. This revised structure updates the global policy only after the local policy for a specific task has converged, enabling online learning. In incrementally learning new tasks, the global policy is updated using hard parameter sharing and Memory Aware Synapses (MAS), creating task-specific layers while penalizing significant parameter changes in shared layers linked to prior tasks. This method reduces overfitting and mitigates catastrophic forgetting in robotic CRL. We validate our method on PR2, UR5 and Sawyer robots in simulators as well as on a real UR5 robot.</description><identifier>ISSN: 2377-3766</identifier><identifier>EISSN: 2377-3766</identifier><identifier>DOI: 10.1109/LRA.2024.3487484</identifier><identifier>CODEN: IRALC6</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Algorithms ; catastrophic forgetting ; Computational modeling ; continual learning ; Continuing education ; Deep learning ; Deep reinforcement learning ; Distance learning ; Flight simulators ; Global Positioning System ; Heuristic algorithms ; Industrial robots ; Machine learning ; Memory tasks ; Neural networks ; Parameters ; Reinforcement learning ; Robot control ; Robot learning ; Robots ; Searching ; sequential multitask learning ; Synapses ; Task complexity ; Training ; Trajectory</subject><ispartof>IEEE robotics and automation letters, 2024-12, Vol.9 (12), p.11242-11249</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c175t-90ef9fae3fb59ac5bb7eafef4a9e7c5b05a059b6abed85bf8ec496c4892835763</cites><orcidid>0000-0003-4456-6236 ; 0000-0001-7863-3260 ; 0000-0001-9672-7615 ; 0000-0003-4835-3713 ; 0000-0002-6783-8647</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10737442$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10737442$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Dong, Qingwei</creatorcontrib><creatorcontrib>Zeng, Peng</creatorcontrib><creatorcontrib>He, Yunpeng</creatorcontrib><creatorcontrib>Wan, Guangxi</creatorcontrib><creatorcontrib>Dong, Xiaoting</creatorcontrib><title>Mitigating Catastrophic Forgetting in Robot Continual Learning: A Guided Policy Search Approach Enhanced With Memory-Aware Synapses</title><title>IEEE robotics and automation letters</title><addtitle>LRA</addtitle><description>Complex operational scenarios increasingly demand that industrial robots sequentially resolve multiple interrelated problems to accomplish complex operational tasks, necessitating robots to have the capacity for not only learning through interaction with the environment but also for continual learning. Current deep reinforcement learning methods have demonstrated substantial prowess in enabling robots to learn individual simple operational skills. However, catastrophic forgetting regarding the continual learning of various distinct tasks under a unified control policy remains a challenge. The lengthy sequential decision-making trajectory in reinforcement learning scenarios results in a massive state-action search space for the agent. Moreover, low-value state-action samples exacerbate the difficulty of continuous learning in reinforcement learning problems. In this letter, we propose a Continual Reinforcement Learning (CRL) method that accommodates the incremental multiskill learning demands of robots. We transform the tightly coupled structure in Guided Policy Search (GPS) algorithms, which closely intertwine local and global policies, into a loosely coupled structure. This revised structure updates the global policy only after the local policy for a specific task has converged, enabling online learning. In incrementally learning new tasks, the global policy is updated using hard parameter sharing and Memory Aware Synapses (MAS), creating task-specific layers while penalizing significant parameter changes in shared layers linked to prior tasks. This method reduces overfitting and mitigates catastrophic forgetting in robotic CRL. We validate our method on PR2, UR5 and Sawyer robots in simulators as well as on a real UR5 robot.</description><subject>Algorithms</subject><subject>catastrophic forgetting</subject><subject>Computational modeling</subject><subject>continual learning</subject><subject>Continuing education</subject><subject>Deep learning</subject><subject>Deep reinforcement learning</subject><subject>Distance learning</subject><subject>Flight simulators</subject><subject>Global Positioning System</subject><subject>Heuristic algorithms</subject><subject>Industrial robots</subject><subject>Machine learning</subject><subject>Memory tasks</subject><subject>Neural networks</subject><subject>Parameters</subject><subject>Reinforcement learning</subject><subject>Robot control</subject><subject>Robot learning</subject><subject>Robots</subject><subject>Searching</subject><subject>sequential multitask learning</subject><subject>Synapses</subject><subject>Task complexity</subject><subject>Training</subject><subject>Trajectory</subject><issn>2377-3766</issn><issn>2377-3766</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNUD1PwzAUjBBIIGBnYLDEnOLEdmyzRRUUpFYgPsQYvbgvrVGJg-0KdeaP41KGTu_e3b0PXZZdFHRUFFRfT5_rUUlLPmJcSa74QXZSMilzJqvqcA8fZ-chfFBKC1FKpsVJ9jOz0S4g2n5BxhAhRO-GpTXkzvkFxj_e9uTZtS6SsesTsYYVmSL4Pmk3pCaTtZ3jnDy5lTUb8pIUsyT1MHgHCdz2S-hN0t9tXJIZfjq_yetv8EheNj0MAcNZdtTBKuD5fz3N3u5uX8f3-fRx8jCup7kppIi5ptjpDpB1rdBgRNtKhA47DhplaqkAKnRbQYtzJdpOoeG6MlzpUjEhK3aaXe32pte-1hhi8-HWvk8nG1aUFVVKKppcdOcy3oXgsWsGbz_Bb5qCNtu0m5R2s027-U87jVzuRiwi7tklk5yX7BccH32E</recordid><startdate>20241201</startdate><enddate>20241201</enddate><creator>Dong, Qingwei</creator><creator>Zeng, Peng</creator><creator>He, Yunpeng</creator><creator>Wan, Guangxi</creator><creator>Dong, Xiaoting</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0003-4456-6236</orcidid><orcidid>https://orcid.org/0000-0001-7863-3260</orcidid><orcidid>https://orcid.org/0000-0001-9672-7615</orcidid><orcidid>https://orcid.org/0000-0003-4835-3713</orcidid><orcidid>https://orcid.org/0000-0002-6783-8647</orcidid></search><sort><creationdate>20241201</creationdate><title>Mitigating Catastrophic Forgetting in Robot Continual Learning: A Guided Policy Search Approach Enhanced With Memory-Aware Synapses</title><author>Dong, Qingwei ; Zeng, Peng ; He, Yunpeng ; Wan, Guangxi ; Dong, Xiaoting</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c175t-90ef9fae3fb59ac5bb7eafef4a9e7c5b05a059b6abed85bf8ec496c4892835763</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Algorithms</topic><topic>catastrophic forgetting</topic><topic>Computational modeling</topic><topic>continual learning</topic><topic>Continuing education</topic><topic>Deep learning</topic><topic>Deep reinforcement learning</topic><topic>Distance learning</topic><topic>Flight simulators</topic><topic>Global Positioning System</topic><topic>Heuristic algorithms</topic><topic>Industrial robots</topic><topic>Machine learning</topic><topic>Memory tasks</topic><topic>Neural networks</topic><topic>Parameters</topic><topic>Reinforcement learning</topic><topic>Robot control</topic><topic>Robot learning</topic><topic>Robots</topic><topic>Searching</topic><topic>sequential multitask learning</topic><topic>Synapses</topic><topic>Task complexity</topic><topic>Training</topic><topic>Trajectory</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Dong, Qingwei</creatorcontrib><creatorcontrib>Zeng, Peng</creatorcontrib><creatorcontrib>He, Yunpeng</creatorcontrib><creatorcontrib>Wan, Guangxi</creatorcontrib><creatorcontrib>Dong, Xiaoting</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE/IET Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE robotics and automation letters</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Dong, Qingwei</au><au>Zeng, Peng</au><au>He, Yunpeng</au><au>Wan, Guangxi</au><au>Dong, Xiaoting</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Mitigating Catastrophic Forgetting in Robot Continual Learning: A Guided Policy Search Approach Enhanced With Memory-Aware Synapses</atitle><jtitle>IEEE robotics and automation letters</jtitle><stitle>LRA</stitle><date>2024-12-01</date><risdate>2024</risdate><volume>9</volume><issue>12</issue><spage>11242</spage><epage>11249</epage><pages>11242-11249</pages><issn>2377-3766</issn><eissn>2377-3766</eissn><coden>IRALC6</coden><abstract>Complex operational scenarios increasingly demand that industrial robots sequentially resolve multiple interrelated problems to accomplish complex operational tasks, necessitating robots to have the capacity for not only learning through interaction with the environment but also for continual learning. Current deep reinforcement learning methods have demonstrated substantial prowess in enabling robots to learn individual simple operational skills. However, catastrophic forgetting regarding the continual learning of various distinct tasks under a unified control policy remains a challenge. The lengthy sequential decision-making trajectory in reinforcement learning scenarios results in a massive state-action search space for the agent. Moreover, low-value state-action samples exacerbate the difficulty of continuous learning in reinforcement learning problems. In this letter, we propose a Continual Reinforcement Learning (CRL) method that accommodates the incremental multiskill learning demands of robots. We transform the tightly coupled structure in Guided Policy Search (GPS) algorithms, which closely intertwine local and global policies, into a loosely coupled structure. This revised structure updates the global policy only after the local policy for a specific task has converged, enabling online learning. In incrementally learning new tasks, the global policy is updated using hard parameter sharing and Memory Aware Synapses (MAS), creating task-specific layers while penalizing significant parameter changes in shared layers linked to prior tasks. This method reduces overfitting and mitigates catastrophic forgetting in robotic CRL. We validate our method on PR2, UR5 and Sawyer robots in simulators as well as on a real UR5 robot.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/LRA.2024.3487484</doi><tpages>8</tpages><orcidid>https://orcid.org/0000-0003-4456-6236</orcidid><orcidid>https://orcid.org/0000-0001-7863-3260</orcidid><orcidid>https://orcid.org/0000-0001-9672-7615</orcidid><orcidid>https://orcid.org/0000-0003-4835-3713</orcidid><orcidid>https://orcid.org/0000-0002-6783-8647</orcidid></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 2377-3766
ispartof IEEE robotics and automation letters, 2024-12, Vol.9 (12), p.11242-11249
issn 2377-3766
2377-3766
language eng
recordid cdi_proquest_journals_3126088780
source IEEE/IET Electronic Library (IEL)
subjects Algorithms
catastrophic forgetting
Computational modeling
continual learning
Continuing education
Deep learning
Deep reinforcement learning
Distance learning
Flight simulators
Global Positioning System
Heuristic algorithms
Industrial robots
Machine learning
Memory tasks
Neural networks
Parameters
Reinforcement learning
Robot control
Robot learning
Robots
Searching
sequential multitask learning
Synapses
Task complexity
Training
Trajectory
title Mitigating Catastrophic Forgetting in Robot Continual Learning: A Guided Policy Search Approach Enhanced With Memory-Aware Synapses
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-28T07%3A47%3A15IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Mitigating%20Catastrophic%20Forgetting%20in%20Robot%20Continual%20Learning:%20A%20Guided%20Policy%20Search%20Approach%20Enhanced%20With%20Memory-Aware%20Synapses&rft.jtitle=IEEE%20robotics%20and%20automation%20letters&rft.au=Dong,%20Qingwei&rft.date=2024-12-01&rft.volume=9&rft.issue=12&rft.spage=11242&rft.epage=11249&rft.pages=11242-11249&rft.issn=2377-3766&rft.eissn=2377-3766&rft.coden=IRALC6&rft_id=info:doi/10.1109/LRA.2024.3487484&rft_dat=%3Cproquest_RIE%3E3126088780%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3126088780&rft_id=info:pmid/&rft_ieee_id=10737442&rfr_iscdi=true