Mitigating Catastrophic Forgetting in Robot Continual Learning: A Guided Policy Search Approach Enhanced With Memory-Aware Synapses
Complex operational scenarios increasingly demand that industrial robots sequentially resolve multiple interrelated problems to accomplish complex operational tasks, necessitating robots to have the capacity for not only learning through interaction with the environment but also for continual learni...
Gespeichert in:
Veröffentlicht in: | IEEE robotics and automation letters 2024-12, Vol.9 (12), p.11242-11249 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 11249 |
---|---|
container_issue | 12 |
container_start_page | 11242 |
container_title | IEEE robotics and automation letters |
container_volume | 9 |
creator | Dong, Qingwei Zeng, Peng He, Yunpeng Wan, Guangxi Dong, Xiaoting |
description | Complex operational scenarios increasingly demand that industrial robots sequentially resolve multiple interrelated problems to accomplish complex operational tasks, necessitating robots to have the capacity for not only learning through interaction with the environment but also for continual learning. Current deep reinforcement learning methods have demonstrated substantial prowess in enabling robots to learn individual simple operational skills. However, catastrophic forgetting regarding the continual learning of various distinct tasks under a unified control policy remains a challenge. The lengthy sequential decision-making trajectory in reinforcement learning scenarios results in a massive state-action search space for the agent. Moreover, low-value state-action samples exacerbate the difficulty of continuous learning in reinforcement learning problems. In this letter, we propose a Continual Reinforcement Learning (CRL) method that accommodates the incremental multiskill learning demands of robots. We transform the tightly coupled structure in Guided Policy Search (GPS) algorithms, which closely intertwine local and global policies, into a loosely coupled structure. This revised structure updates the global policy only after the local policy for a specific task has converged, enabling online learning. In incrementally learning new tasks, the global policy is updated using hard parameter sharing and Memory Aware Synapses (MAS), creating task-specific layers while penalizing significant parameter changes in shared layers linked to prior tasks. This method reduces overfitting and mitigates catastrophic forgetting in robotic CRL. We validate our method on PR2, UR5 and Sawyer robots in simulators as well as on a real UR5 robot. |
doi_str_mv | 10.1109/LRA.2024.3487484 |
format | Article |
fullrecord | <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_3126088780</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10737442</ieee_id><sourcerecordid>3126088780</sourcerecordid><originalsourceid>FETCH-LOGICAL-c175t-90ef9fae3fb59ac5bb7eafef4a9e7c5b05a059b6abed85bf8ec496c4892835763</originalsourceid><addsrcrecordid>eNpNUD1PwzAUjBBIIGBnYLDEnOLEdmyzRRUUpFYgPsQYvbgvrVGJg-0KdeaP41KGTu_e3b0PXZZdFHRUFFRfT5_rUUlLPmJcSa74QXZSMilzJqvqcA8fZ-chfFBKC1FKpsVJ9jOz0S4g2n5BxhAhRO-GpTXkzvkFxj_e9uTZtS6SsesTsYYVmSL4Pmk3pCaTtZ3jnDy5lTUb8pIUsyT1MHgHCdz2S-hN0t9tXJIZfjq_yetv8EheNj0MAcNZdtTBKuD5fz3N3u5uX8f3-fRx8jCup7kppIi5ptjpDpB1rdBgRNtKhA47DhplaqkAKnRbQYtzJdpOoeG6MlzpUjEhK3aaXe32pte-1hhi8-HWvk8nG1aUFVVKKppcdOcy3oXgsWsGbz_Bb5qCNtu0m5R2s027-U87jVzuRiwi7tklk5yX7BccH32E</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3126088780</pqid></control><display><type>article</type><title>Mitigating Catastrophic Forgetting in Robot Continual Learning: A Guided Policy Search Approach Enhanced With Memory-Aware Synapses</title><source>IEEE/IET Electronic Library (IEL)</source><creator>Dong, Qingwei ; Zeng, Peng ; He, Yunpeng ; Wan, Guangxi ; Dong, Xiaoting</creator><creatorcontrib>Dong, Qingwei ; Zeng, Peng ; He, Yunpeng ; Wan, Guangxi ; Dong, Xiaoting</creatorcontrib><description>Complex operational scenarios increasingly demand that industrial robots sequentially resolve multiple interrelated problems to accomplish complex operational tasks, necessitating robots to have the capacity for not only learning through interaction with the environment but also for continual learning. Current deep reinforcement learning methods have demonstrated substantial prowess in enabling robots to learn individual simple operational skills. However, catastrophic forgetting regarding the continual learning of various distinct tasks under a unified control policy remains a challenge. The lengthy sequential decision-making trajectory in reinforcement learning scenarios results in a massive state-action search space for the agent. Moreover, low-value state-action samples exacerbate the difficulty of continuous learning in reinforcement learning problems. In this letter, we propose a Continual Reinforcement Learning (CRL) method that accommodates the incremental multiskill learning demands of robots. We transform the tightly coupled structure in Guided Policy Search (GPS) algorithms, which closely intertwine local and global policies, into a loosely coupled structure. This revised structure updates the global policy only after the local policy for a specific task has converged, enabling online learning. In incrementally learning new tasks, the global policy is updated using hard parameter sharing and Memory Aware Synapses (MAS), creating task-specific layers while penalizing significant parameter changes in shared layers linked to prior tasks. This method reduces overfitting and mitigates catastrophic forgetting in robotic CRL. We validate our method on PR2, UR5 and Sawyer robots in simulators as well as on a real UR5 robot.</description><identifier>ISSN: 2377-3766</identifier><identifier>EISSN: 2377-3766</identifier><identifier>DOI: 10.1109/LRA.2024.3487484</identifier><identifier>CODEN: IRALC6</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Algorithms ; catastrophic forgetting ; Computational modeling ; continual learning ; Continuing education ; Deep learning ; Deep reinforcement learning ; Distance learning ; Flight simulators ; Global Positioning System ; Heuristic algorithms ; Industrial robots ; Machine learning ; Memory tasks ; Neural networks ; Parameters ; Reinforcement learning ; Robot control ; Robot learning ; Robots ; Searching ; sequential multitask learning ; Synapses ; Task complexity ; Training ; Trajectory</subject><ispartof>IEEE robotics and automation letters, 2024-12, Vol.9 (12), p.11242-11249</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c175t-90ef9fae3fb59ac5bb7eafef4a9e7c5b05a059b6abed85bf8ec496c4892835763</cites><orcidid>0000-0003-4456-6236 ; 0000-0001-7863-3260 ; 0000-0001-9672-7615 ; 0000-0003-4835-3713 ; 0000-0002-6783-8647</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10737442$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10737442$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Dong, Qingwei</creatorcontrib><creatorcontrib>Zeng, Peng</creatorcontrib><creatorcontrib>He, Yunpeng</creatorcontrib><creatorcontrib>Wan, Guangxi</creatorcontrib><creatorcontrib>Dong, Xiaoting</creatorcontrib><title>Mitigating Catastrophic Forgetting in Robot Continual Learning: A Guided Policy Search Approach Enhanced With Memory-Aware Synapses</title><title>IEEE robotics and automation letters</title><addtitle>LRA</addtitle><description>Complex operational scenarios increasingly demand that industrial robots sequentially resolve multiple interrelated problems to accomplish complex operational tasks, necessitating robots to have the capacity for not only learning through interaction with the environment but also for continual learning. Current deep reinforcement learning methods have demonstrated substantial prowess in enabling robots to learn individual simple operational skills. However, catastrophic forgetting regarding the continual learning of various distinct tasks under a unified control policy remains a challenge. The lengthy sequential decision-making trajectory in reinforcement learning scenarios results in a massive state-action search space for the agent. Moreover, low-value state-action samples exacerbate the difficulty of continuous learning in reinforcement learning problems. In this letter, we propose a Continual Reinforcement Learning (CRL) method that accommodates the incremental multiskill learning demands of robots. We transform the tightly coupled structure in Guided Policy Search (GPS) algorithms, which closely intertwine local and global policies, into a loosely coupled structure. This revised structure updates the global policy only after the local policy for a specific task has converged, enabling online learning. In incrementally learning new tasks, the global policy is updated using hard parameter sharing and Memory Aware Synapses (MAS), creating task-specific layers while penalizing significant parameter changes in shared layers linked to prior tasks. This method reduces overfitting and mitigates catastrophic forgetting in robotic CRL. We validate our method on PR2, UR5 and Sawyer robots in simulators as well as on a real UR5 robot.</description><subject>Algorithms</subject><subject>catastrophic forgetting</subject><subject>Computational modeling</subject><subject>continual learning</subject><subject>Continuing education</subject><subject>Deep learning</subject><subject>Deep reinforcement learning</subject><subject>Distance learning</subject><subject>Flight simulators</subject><subject>Global Positioning System</subject><subject>Heuristic algorithms</subject><subject>Industrial robots</subject><subject>Machine learning</subject><subject>Memory tasks</subject><subject>Neural networks</subject><subject>Parameters</subject><subject>Reinforcement learning</subject><subject>Robot control</subject><subject>Robot learning</subject><subject>Robots</subject><subject>Searching</subject><subject>sequential multitask learning</subject><subject>Synapses</subject><subject>Task complexity</subject><subject>Training</subject><subject>Trajectory</subject><issn>2377-3766</issn><issn>2377-3766</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNUD1PwzAUjBBIIGBnYLDEnOLEdmyzRRUUpFYgPsQYvbgvrVGJg-0KdeaP41KGTu_e3b0PXZZdFHRUFFRfT5_rUUlLPmJcSa74QXZSMilzJqvqcA8fZ-chfFBKC1FKpsVJ9jOz0S4g2n5BxhAhRO-GpTXkzvkFxj_e9uTZtS6SsesTsYYVmSL4Pmk3pCaTtZ3jnDy5lTUb8pIUsyT1MHgHCdz2S-hN0t9tXJIZfjq_yetv8EheNj0MAcNZdtTBKuD5fz3N3u5uX8f3-fRx8jCup7kppIi5ptjpDpB1rdBgRNtKhA47DhplaqkAKnRbQYtzJdpOoeG6MlzpUjEhK3aaXe32pte-1hhi8-HWvk8nG1aUFVVKKppcdOcy3oXgsWsGbz_Bb5qCNtu0m5R2s027-U87jVzuRiwi7tklk5yX7BccH32E</recordid><startdate>20241201</startdate><enddate>20241201</enddate><creator>Dong, Qingwei</creator><creator>Zeng, Peng</creator><creator>He, Yunpeng</creator><creator>Wan, Guangxi</creator><creator>Dong, Xiaoting</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0003-4456-6236</orcidid><orcidid>https://orcid.org/0000-0001-7863-3260</orcidid><orcidid>https://orcid.org/0000-0001-9672-7615</orcidid><orcidid>https://orcid.org/0000-0003-4835-3713</orcidid><orcidid>https://orcid.org/0000-0002-6783-8647</orcidid></search><sort><creationdate>20241201</creationdate><title>Mitigating Catastrophic Forgetting in Robot Continual Learning: A Guided Policy Search Approach Enhanced With Memory-Aware Synapses</title><author>Dong, Qingwei ; Zeng, Peng ; He, Yunpeng ; Wan, Guangxi ; Dong, Xiaoting</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c175t-90ef9fae3fb59ac5bb7eafef4a9e7c5b05a059b6abed85bf8ec496c4892835763</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Algorithms</topic><topic>catastrophic forgetting</topic><topic>Computational modeling</topic><topic>continual learning</topic><topic>Continuing education</topic><topic>Deep learning</topic><topic>Deep reinforcement learning</topic><topic>Distance learning</topic><topic>Flight simulators</topic><topic>Global Positioning System</topic><topic>Heuristic algorithms</topic><topic>Industrial robots</topic><topic>Machine learning</topic><topic>Memory tasks</topic><topic>Neural networks</topic><topic>Parameters</topic><topic>Reinforcement learning</topic><topic>Robot control</topic><topic>Robot learning</topic><topic>Robots</topic><topic>Searching</topic><topic>sequential multitask learning</topic><topic>Synapses</topic><topic>Task complexity</topic><topic>Training</topic><topic>Trajectory</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Dong, Qingwei</creatorcontrib><creatorcontrib>Zeng, Peng</creatorcontrib><creatorcontrib>He, Yunpeng</creatorcontrib><creatorcontrib>Wan, Guangxi</creatorcontrib><creatorcontrib>Dong, Xiaoting</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE/IET Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE robotics and automation letters</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Dong, Qingwei</au><au>Zeng, Peng</au><au>He, Yunpeng</au><au>Wan, Guangxi</au><au>Dong, Xiaoting</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Mitigating Catastrophic Forgetting in Robot Continual Learning: A Guided Policy Search Approach Enhanced With Memory-Aware Synapses</atitle><jtitle>IEEE robotics and automation letters</jtitle><stitle>LRA</stitle><date>2024-12-01</date><risdate>2024</risdate><volume>9</volume><issue>12</issue><spage>11242</spage><epage>11249</epage><pages>11242-11249</pages><issn>2377-3766</issn><eissn>2377-3766</eissn><coden>IRALC6</coden><abstract>Complex operational scenarios increasingly demand that industrial robots sequentially resolve multiple interrelated problems to accomplish complex operational tasks, necessitating robots to have the capacity for not only learning through interaction with the environment but also for continual learning. Current deep reinforcement learning methods have demonstrated substantial prowess in enabling robots to learn individual simple operational skills. However, catastrophic forgetting regarding the continual learning of various distinct tasks under a unified control policy remains a challenge. The lengthy sequential decision-making trajectory in reinforcement learning scenarios results in a massive state-action search space for the agent. Moreover, low-value state-action samples exacerbate the difficulty of continuous learning in reinforcement learning problems. In this letter, we propose a Continual Reinforcement Learning (CRL) method that accommodates the incremental multiskill learning demands of robots. We transform the tightly coupled structure in Guided Policy Search (GPS) algorithms, which closely intertwine local and global policies, into a loosely coupled structure. This revised structure updates the global policy only after the local policy for a specific task has converged, enabling online learning. In incrementally learning new tasks, the global policy is updated using hard parameter sharing and Memory Aware Synapses (MAS), creating task-specific layers while penalizing significant parameter changes in shared layers linked to prior tasks. This method reduces overfitting and mitigates catastrophic forgetting in robotic CRL. We validate our method on PR2, UR5 and Sawyer robots in simulators as well as on a real UR5 robot.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/LRA.2024.3487484</doi><tpages>8</tpages><orcidid>https://orcid.org/0000-0003-4456-6236</orcidid><orcidid>https://orcid.org/0000-0001-7863-3260</orcidid><orcidid>https://orcid.org/0000-0001-9672-7615</orcidid><orcidid>https://orcid.org/0000-0003-4835-3713</orcidid><orcidid>https://orcid.org/0000-0002-6783-8647</orcidid></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 2377-3766 |
ispartof | IEEE robotics and automation letters, 2024-12, Vol.9 (12), p.11242-11249 |
issn | 2377-3766 2377-3766 |
language | eng |
recordid | cdi_proquest_journals_3126088780 |
source | IEEE/IET Electronic Library (IEL) |
subjects | Algorithms catastrophic forgetting Computational modeling continual learning Continuing education Deep learning Deep reinforcement learning Distance learning Flight simulators Global Positioning System Heuristic algorithms Industrial robots Machine learning Memory tasks Neural networks Parameters Reinforcement learning Robot control Robot learning Robots Searching sequential multitask learning Synapses Task complexity Training Trajectory |
title | Mitigating Catastrophic Forgetting in Robot Continual Learning: A Guided Policy Search Approach Enhanced With Memory-Aware Synapses |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-28T07%3A47%3A15IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Mitigating%20Catastrophic%20Forgetting%20in%20Robot%20Continual%20Learning:%20A%20Guided%20Policy%20Search%20Approach%20Enhanced%20With%20Memory-Aware%20Synapses&rft.jtitle=IEEE%20robotics%20and%20automation%20letters&rft.au=Dong,%20Qingwei&rft.date=2024-12-01&rft.volume=9&rft.issue=12&rft.spage=11242&rft.epage=11249&rft.pages=11242-11249&rft.issn=2377-3766&rft.eissn=2377-3766&rft.coden=IRALC6&rft_id=info:doi/10.1109/LRA.2024.3487484&rft_dat=%3Cproquest_RIE%3E3126088780%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3126088780&rft_id=info:pmid/&rft_ieee_id=10737442&rfr_iscdi=true |