Mitigating Catastrophic Forgetting in Robot Continual Learning: A Guided Policy Search Approach Enhanced With Memory-Aware Synapses

Complex operational scenarios increasingly demand that industrial robots sequentially resolve multiple interrelated problems to accomplish complex operational tasks, necessitating robots to have the capacity for not only learning through interaction with the environment but also for continual learni...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE robotics and automation letters 2024-12, Vol.9 (12), p.11242-11249
Hauptverfasser:	Dong, Qingwei, Zeng, Peng, He, Yunpeng, Wan, Guangxi, Dong, Xiaoting
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms catastrophic forgetting Computational modeling continual learning Continuing education Deep learning Deep reinforcement learning Distance learning Flight simulators Global Positioning System Heuristic algorithms Industrial robots Machine learning Memory tasks Neural networks Parameters Reinforcement learning Robot control Robot learning Robots Searching sequential multitask learning Synapses Task complexity Training Trajectory
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	11249
container_issue	12
container_start_page	11242
container_title	IEEE robotics and automation letters
container_volume	9
creator	Dong, Qingwei Zeng, Peng He, Yunpeng Wan, Guangxi Dong, Xiaoting
description	Complex operational scenarios increasingly demand that industrial robots sequentially resolve multiple interrelated problems to accomplish complex operational tasks, necessitating robots to have the capacity for not only learning through interaction with the environment but also for continual learning. Current deep reinforcement learning methods have demonstrated substantial prowess in enabling robots to learn individual simple operational skills. However, catastrophic forgetting regarding the continual learning of various distinct tasks under a unified control policy remains a challenge. The lengthy sequential decision-making trajectory in reinforcement learning scenarios results in a massive state-action search space for the agent. Moreover, low-value state-action samples exacerbate the difficulty of continuous learning in reinforcement learning problems. In this letter, we propose a Continual Reinforcement Learning (CRL) method that accommodates the incremental multiskill learning demands of robots. We transform the tightly coupled structure in Guided Policy Search (GPS) algorithms, which closely intertwine local and global policies, into a loosely coupled structure. This revised structure updates the global policy only after the local policy for a specific task has converged, enabling online learning. In incrementally learning new tasks, the global policy is updated using hard parameter sharing and Memory Aware Synapses (MAS), creating task-specific layers while penalizing significant parameter changes in shared layers linked to prior tasks. This method reduces overfitting and mitigates catastrophic forgetting in robotic CRL. We validate our method on PR2, UR5 and Sawyer robots in simulators as well as on a real UR5 robot.
doi_str_mv	10.1109/LRA.2024.3487484
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_3126088780</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10737442</ieee_id><sourcerecordid>3126088780</sourcerecordid><originalsourceid>FETCH-LOGICAL-c175t-90ef9fae3fb59ac5bb7eafef4a9e7c5b05a059b6abed85bf8ec496c4892835763</originalsourceid><addsrcrecordid>eNpNUD1PwzAUjBBIIGBnYLDEnOLEdmyzRRUUpFYgPsQYvbgvrVGJg-0KdeaP41KGTu_e3b0PXZZdFHRUFFRfT5_rUUlLPmJcSa74QXZSMilzJqvqcA8fZ-chfFBKC1FKpsVJ9jOz0S4g2n5BxhAhRO-GpTXkzvkFxj_e9uTZtS6SsesTsYYVmSL4Pmk3pCaTtZ3jnDy5lTUb8pIUsyT1MHgHCdz2S-hN0t9tXJIZfjq_yetv8EheNj0MAcNZdtTBKuD5fz3N3u5uX8f3-fRx8jCup7kppIi5ptjpDpB1rdBgRNtKhA47DhplaqkAKnRbQYtzJdpOoeG6MlzpUjEhK3aaXe32pte-1hhi8-HWvk8nG1aUFVVKKppcdOcy3oXgsWsGbz_Bb5qCNtu0m5R2s027-U87jVzuRiwi7tklk5yX7BccH32E</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3126088780</pqid></control><display><type>article</type><title>Mitigating Catastrophic Forgetting in Robot Continual Learning: A Guided Policy Search Approach Enhanced With Memory-Aware Synapses</title><source>IEEE/IET Electronic Library (IEL)</source><creator>Dong, Qingwei ; Zeng, Peng ; He, Yunpeng ; Wan, Guangxi ; Dong, Xiaoting</creator><creatorcontrib>Dong, Qingwei ; Zeng, Peng ; He, Yunpeng ; Wan, Guangxi ; Dong, Xiaoting</creatorcontrib><description>Complex operational scenarios increasingly demand that industrial robots sequentially resolve multiple interrelated problems to accomplish complex operational tasks, necessitating robots to have the capacity for not only learning through interaction with the environment but also for continual learning. Current deep reinforcement learning methods have demonstrated substantial prowess in enabling robots to learn individual simple operational skills. However, catastrophic forgetting regarding the continual learning of various distinct tasks under a unified control policy remains a challenge. The lengthy sequential decision-making trajectory in reinforcement learning scenarios results in a massive state-action search space for the agent. Moreover, low-value state-action samples exacerbate the difficulty of continuous learning in reinforcement learning problems. In this letter, we propose a Continual Reinforcement Learning (CRL) method that accommodates the incremental multiskill learning demands of robots. We transform the tightly coupled structure in Guided Policy Search (GPS) algorithms, which closely intertwine local and global policies, into a loosely coupled structure. This revised structure updates the global policy only after the local policy for a specific task has converged, enabling online learning. In incrementally learning new tasks, the global policy is updated using hard parameter sharing and Memory Aware Synapses (MAS), creating task-specific layers while penalizing significant parameter changes in shared layers linked to prior tasks. This method reduces overfitting and mitigates catastrophic forgetting in robotic CRL. We validate our method on PR2, UR5 and Sawyer robots in simulators as well as on a real UR5 robot.</description><identifier>ISSN: 2377-3766</identifier><identifier>EISSN: 2377-3766</identifier><identifier>DOI: 10.1109/LRA.2024.3487484</identifier><identifier>CODEN: IRALC6</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Algorithms ; catastrophic forgetting ; Computational modeling ; continual learning ; Continuing education ; Deep learning ; Deep reinforcement learning ; Distance learning ; Flight simulators ; Global Positioning System ; Heuristic algorithms ; Industrial robots ; Machine learning ; Memory tasks ; Neural networks ; Parameters ; Reinforcement learning ; Robot control ; Robot learning ; Robots ; Searching ; sequential multitask learning ; Synapses ; Task complexity ; Training ; Trajectory</subject><ispartof>IEEE robotics and automation letters, 2024-12, Vol.9 (12), p.11242-11249</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c175t-90ef9fae3fb59ac5bb7eafef4a9e7c5b05a059b6abed85bf8ec496c4892835763</cites><orcidid>0000-0003-4456-6236 ; 0000-0001-7863-3260 ; 0000-0001-9672-7615 ; 0000-0003-4835-3713 ; 0000-0002-6783-8647</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10737442$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10737442$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Dong, Qingwei</creatorcontrib><creatorcontrib>Zeng, Peng</creatorcontrib><creatorcontrib>He, Yunpeng</creatorcontrib><creatorcontrib>Wan, Guangxi</creatorcontrib><creatorcontrib>Dong, Xiaoting</creatorcontrib><title>Mitigating Catastrophic Forgetting in Robot Continual Learning: A Guided Policy Search Approach Enhanced With Memory-Aware Synapses</title><title>IEEE robotics and automation letters</title><addtitle>LRA</addtitle><description>Complex operational scenarios increasingly demand that industrial robots sequentially resolve multiple interrelated problems to accomplish complex operational tasks, necessitating robots to have the capacity for not only learning through interaction with the environment but also for continual learning. Current deep reinforcement learning methods have demonstrated substantial prowess in enabling robots to learn individual simple operational skills. However, catastrophic forgetting regarding the continual learning of various distinct tasks under a unified control policy remains a challenge. The lengthy sequential decision-making trajectory in reinforcement learning scenarios results in a massive state-action search space for the agent. Moreover, low-value state-action samples exacerbate the difficulty of continuous learning in reinforcement learning problems. In this letter, we propose a Continual Reinforcement Learning (CRL) method that accommodates the incremental multiskill learning demands of robots. We transform the tightly coupled structure in Guided Policy Search (GPS) algorithms, which closely intertwine local and global policies, into a loosely coupled structure. This revised structure updates the global policy only after the local policy for a specific task has converged, enabling online learning. In incrementally learning new tasks, the global policy is updated using hard parameter sharing and Memory Aware Synapses (MAS), creating task-specific layers while penalizing significant parameter changes in shared layers linked to prior tasks. This method reduces overfitting and mitigates catastrophic forgetting in robotic CRL. We validate our method on PR2, UR5 and Sawyer robots in simulators as well as on a real UR5 robot.</description><subject>Algorithms</subject><subject>catastrophic forgetting</subject><subject>Computational modeling</subject><subject>continual learning</subject><subject>Continuing education</subject><subject>Deep learning</subject><subject>Deep reinforcement learning</subject><subject>Distance learning</subject><subject>Flight simulators</subject><subject>Global Positioning System</subject><subject>Heuristic algorithms</subject><subject>Industrial robots</subject><subject>Machine learning</subject><subject>Memory tasks</subject><subject>Neural networks</subject><subject>Parameters</subject><subject>Reinforcement learning</subject><subject>Robot control</subject><subject>Robot learning</subject><subject>Robots</subject><subject>Searching</subject><subject>sequential multitask learning</subject><subject>Synapses</subject><subject>Task complexity</subject><subject>Training</subject><subject>Trajectory</subject><issn>2377-3766</issn><issn>2377-3766</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNUD1PwzAUjBBIIGBnYLDEnOLEdmyzRRUUpFYgPsQYvbgvrVGJg-0KdeaP41KGTu_e3b0PXZZdFHRUFFRfT5_rUUlLPmJcSa74QXZSMilzJqvqcA8fZ-chfFBKC1FKpsVJ9jOz0S4g2n5BxhAhRO-GpTXkzvkFxj_e9uTZtS6SsesTsYYVmSL4Pmk3pCaTtZ3jnDy5lTUb8pIUsyT1MHgHCdz2S-hN0t9tXJIZfjq_yetv8EheNj0MAcNZdtTBKuD5fz3N3u5uX8f3-fRx8jCup7kppIi5ptjpDpB1rdBgRNtKhA47DhplaqkAKnRbQYtzJdpOoeG6MlzpUjEhK3aaXe32pte-1hhi8-HWvk8nG1aUFVVKKppcdOcy3oXgsWsGbz_Bb5qCNtu0m5R2s027-U87jVzuRiwi7tklk5yX7BccH32E</recordid><startdate>20241201</startdate><enddate>20241201</enddate><creator>Dong, Qingwei</creator><creator>Zeng, Peng</creator><creator>He, Yunpeng</creator><creator>Wan, Guangxi</creator><creator>Dong, Xiaoting</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0003-4456-6236</orcidid><orcidid>https://orcid.org/0000-0001-7863-3260</orcidid><orcidid>https://orcid.org/0000-0001-9672-7615</orcidid><orcidid>https://orcid.org/0000-0003-4835-3713</orcidid><orcidid>https://orcid.org/0000-0002-6783-8647</orcidid></search><sort><creationdate>20241201</creationdate><title>Mitigating Catastrophic Forgetting in Robot Continual Learning: A Guided Policy Search Approach Enhanced With Memory-Aware Synapses</title><author>Dong, Qingwei ; Zeng, Peng ; He, Yunpeng ; Wan, Guangxi ; Dong, Xiaoting</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c175t-90ef9fae3fb59ac5bb7eafef4a9e7c5b05a059b6abed85bf8ec496c4892835763</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Algorithms</topic><topic>catastrophic forgetting</topic><topic>Computational modeling</topic><topic>continual learning</topic><topic>Continuing education</topic><topic>Deep learning</topic><topic>Deep reinforcement learning</topic><topic>Distance learning</topic><topic>Flight simulators</topic><topic>Global Positioning System</topic><topic>Heuristic algorithms</topic><topic>Industrial robots</topic><topic>Machine learning</topic><topic>Memory tasks</topic><topic>Neural networks</topic><topic>Parameters</topic><topic>Reinforcement learning</topic><topic>Robot control</topic><topic>Robot learning</topic><topic>Robots</topic><topic>Searching</topic><topic>sequential multitask learning</topic><topic>Synapses</topic><topic>Task complexity</topic><topic>Training</topic><topic>Trajectory</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Dong, Qingwei</creatorcontrib><creatorcontrib>Zeng, Peng</creatorcontrib><creatorcontrib>He, Yunpeng</creatorcontrib><creatorcontrib>Wan, Guangxi</creatorcontrib><creatorcontrib>Dong, Xiaoting</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE/IET Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE robotics and automation letters</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Dong, Qingwei</au><au>Zeng, Peng</au><au>He, Yunpeng</au><au>Wan, Guangxi</au><au>Dong, Xiaoting</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Mitigating Catastrophic Forgetting in Robot Continual Learning: A Guided Policy Search Approach Enhanced With Memory-Aware Synapses</atitle><jtitle>IEEE robotics and automation letters</jtitle><stitle>LRA</stitle><date>2024-12-01</date><risdate>2024</risdate><volume>9</volume><issue>12</issue><spage>11242</spage><epage>11249</epage><pages>11242-11249</pages><issn>2377-3766</issn><eissn>2377-3766</eissn><coden>IRALC6</coden><abstract>Complex operational scenarios increasingly demand that industrial robots sequentially resolve multiple interrelated problems to accomplish complex operational tasks, necessitating robots to have the capacity for not only learning through interaction with the environment but also for continual learning. Current deep reinforcement learning methods have demonstrated substantial prowess in enabling robots to learn individual simple operational skills. However, catastrophic forgetting regarding the continual learning of various distinct tasks under a unified control policy remains a challenge. The lengthy sequential decision-making trajectory in reinforcement learning scenarios results in a massive state-action search space for the agent. Moreover, low-value state-action samples exacerbate the difficulty of continuous learning in reinforcement learning problems. In this letter, we propose a Continual Reinforcement Learning (CRL) method that accommodates the incremental multiskill learning demands of robots. We transform the tightly coupled structure in Guided Policy Search (GPS) algorithms, which closely intertwine local and global policies, into a loosely coupled structure. This revised structure updates the global policy only after the local policy for a specific task has converged, enabling online learning. In incrementally learning new tasks, the global policy is updated using hard parameter sharing and Memory Aware Synapses (MAS), creating task-specific layers while penalizing significant parameter changes in shared layers linked to prior tasks. This method reduces overfitting and mitigates catastrophic forgetting in robotic CRL. We validate our method on PR2, UR5 and Sawyer robots in simulators as well as on a real UR5 robot.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/LRA.2024.3487484</doi><tpages>8</tpages><orcidid>https://orcid.org/0000-0003-4456-6236</orcidid><orcidid>https://orcid.org/0000-0001-7863-3260</orcidid><orcidid>https://orcid.org/0000-0001-9672-7615</orcidid><orcidid>https://orcid.org/0000-0003-4835-3713</orcidid><orcidid>https://orcid.org/0000-0002-6783-8647</orcidid></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 2377-3766
ispartof	IEEE robotics and automation letters, 2024-12, Vol.9 (12), p.11242-11249
issn	2377-3766 2377-3766
language	eng
recordid	cdi_proquest_journals_3126088780
source	IEEE/IET Electronic Library (IEL)
subjects	Algorithms catastrophic forgetting Computational modeling continual learning Continuing education Deep learning Deep reinforcement learning Distance learning Flight simulators Global Positioning System Heuristic algorithms Industrial robots Machine learning Memory tasks Neural networks Parameters Reinforcement learning Robot control Robot learning Robots Searching sequential multitask learning Synapses Task complexity Training Trajectory
title	Mitigating Catastrophic Forgetting in Robot Continual Learning: A Guided Policy Search Approach Enhanced With Memory-Aware Synapses
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-28T07%3A47%3A15IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Mitigating%20Catastrophic%20Forgetting%20in%20Robot%20Continual%20Learning:%20A%20Guided%20Policy%20Search%20Approach%20Enhanced%20With%20Memory-Aware%20Synapses&rft.jtitle=IEEE%20robotics%20and%20automation%20letters&rft.au=Dong,%20Qingwei&rft.date=2024-12-01&rft.volume=9&rft.issue=12&rft.spage=11242&rft.epage=11249&rft.pages=11242-11249&rft.issn=2377-3766&rft.eissn=2377-3766&rft.coden=IRALC6&rft_id=info:doi/10.1109/LRA.2024.3487484&rft_dat=%3Cproquest_RIE%3E3126088780%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3126088780&rft_id=info:pmid/&rft_ieee_id=10737442&rfr_iscdi=true