Learning to Generate All Feasible Actions
Modern cyber-physical systems are becoming increasingly complex to model, thus motivating data-driven techniques such as reinforcement learning (RL) to find appropriate control agents. However, most systems are subject to hard constraints such as safety or operational bounds. Typically, to learn to...
Gespeichert in:
Veröffentlicht in: | IEEE access 2024, Vol.12, p.40668-40681 |
---|---|
Hauptverfasser: | , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 40681 |
---|---|
container_issue | |
container_start_page | 40668 |
container_title | IEEE access |
container_volume | 12 |
creator | Theile, Mirco Bernardini, Daniele Trumpp, Raphael Piazza, Cristina Caccamo, Marco Sangiovanni-Vincentelli, Alberto L. |
description | Modern cyber-physical systems are becoming increasingly complex to model, thus motivating data-driven techniques such as reinforcement learning (RL) to find appropriate control agents. However, most systems are subject to hard constraints such as safety or operational bounds. Typically, to learn to satisfy these constraints, the agent must violate them systematically, which is computationally prohibitive in most systems. Recent efforts aim to utilize feasibility models that assess whether a proposed action is feasible to avoid applying the agent's infeasible action proposals to the system. However, these efforts focus on guaranteeing constraint satisfaction rather than the agent's learning efficiency. To improve the learning process, we introduce action mapping, a novel approach that divides the learning process into two steps: first learn feasibility and subsequently, the objective by mapping actions into the sets of feasible actions. This paper focuses on the feasibility part by learning to generate all feasible actions through self-supervised querying of the feasibility model. We train the agent by formulating the problem as a distribution matching problem and deriving gradient estimators for different divergences. Through an illustrative example, a robotic path planning scenario, and a robotic grasping simulation, we demonstrate the agent's proficiency in generating actions across disconnected feasible action sets. By addressing the feasibility step, this paper makes it possible to focus future work on the objective part of action mapping, paving the way for an RL framework that is both safe and efficient. |
doi_str_mv | 10.1109/ACCESS.2024.3376739 |
format | Article |
fullrecord | <record><control><sourceid>proquest_ieee_</sourceid><recordid>TN_cdi_proquest_journals_2973240169</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10471398</ieee_id><doaj_id>oai_doaj_org_article_ef2417f41bf04b60936960200959b06a</doaj_id><sourcerecordid>2973240169</sourcerecordid><originalsourceid>FETCH-LOGICAL-c359t-fc9f4990102022b28537c49384c6c1e163c66057ad3242711d112c07f149db2e3</originalsourceid><addsrcrecordid>eNpNkDFPwzAQhS0EElXpL4AhEhNDis927HisorZUqsRQmC3HsatUIS52OvDvcUmFeovPp3vvnj6EHgHPAbB8XVTVcrebE0zYnFLBBZU3aEKAy5wWlN9e9fdoFuMBpyrTqBAT9LK1OvRtv88Gn61tb4MebLboumxldWzrLn3M0Po-PqA7p7toZ5d3ij5Xy4_qLd--rzfVYpsbWsghd0Y6JiUGnAKRmpQFFYZJWjLDDVjg1HCOC6EbShgRAA0AMVg4YLKpiaVTtBl9G68P6hjaLx1-lNet-hv4sFc6DK3prLKOMBCOQe0wqzmWlEue7mJZyBpznbyeR69j8N8nGwd18KfQp_iKSJEC4IQhbdFxywQfY7Du_ypgdUasRsTqjFhdECfV06hqrbVXCiaAypL-As2ictg</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2973240169</pqid></control><display><type>article</type><title>Learning to Generate All Feasible Actions</title><source>Directory of Open Access Journals</source><source>IEEE Xplore Open Access Journals</source><source>EZB Electronic Journals Library</source><creator>Theile, Mirco ; Bernardini, Daniele ; Trumpp, Raphael ; Piazza, Cristina ; Caccamo, Marco ; Sangiovanni-Vincentelli, Alberto L.</creator><creatorcontrib>Theile, Mirco ; Bernardini, Daniele ; Trumpp, Raphael ; Piazza, Cristina ; Caccamo, Marco ; Sangiovanni-Vincentelli, Alberto L.</creatorcontrib><description>Modern cyber-physical systems are becoming increasingly complex to model, thus motivating data-driven techniques such as reinforcement learning (RL) to find appropriate control agents. However, most systems are subject to hard constraints such as safety or operational bounds. Typically, to learn to satisfy these constraints, the agent must violate them systematically, which is computationally prohibitive in most systems. Recent efforts aim to utilize feasibility models that assess whether a proposed action is feasible to avoid applying the agent's infeasible action proposals to the system. However, these efforts focus on guaranteeing constraint satisfaction rather than the agent's learning efficiency. To improve the learning process, we introduce action mapping, a novel approach that divides the learning process into two steps: first learn feasibility and subsequently, the objective by mapping actions into the sets of feasible actions. This paper focuses on the feasibility part by learning to generate all feasible actions through self-supervised querying of the feasibility model. We train the agent by formulating the problem as a distribution matching problem and deriving gradient estimators for different divergences. Through an illustrative example, a robotic path planning scenario, and a robotic grasping simulation, we demonstrate the agent's proficiency in generating actions across disconnected feasible action sets. By addressing the feasibility step, this paper makes it possible to focus future work on the objective part of action mapping, paving the way for an RL framework that is both safe and efficient.</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2024.3376739</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Action mapping ; Bandwidth ; Cyber-physical systems ; Feasibility ; Feasibility studies ; generative neural network ; Grasping ; Grasping (robotics) ; Learning ; Mapping ; Neural networks ; Optimization ; Proposals ; Robots ; Safety ; Self-supervised learning ; Training</subject><ispartof>IEEE access, 2024, Vol.12, p.40668-40681</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c359t-fc9f4990102022b28537c49384c6c1e163c66057ad3242711d112c07f149db2e3</cites><orcidid>0000-0003-1574-8858 ; 0000-0002-0358-8677 ; 0000-0003-2328-044X ; 0000-0002-9416-9557 ; 0000-0003-3902-7916 ; 0000-0003-1298-8389</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10471398$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,780,784,864,2102,4024,27633,27923,27924,27925,54933</link.rule.ids></links><search><creatorcontrib>Theile, Mirco</creatorcontrib><creatorcontrib>Bernardini, Daniele</creatorcontrib><creatorcontrib>Trumpp, Raphael</creatorcontrib><creatorcontrib>Piazza, Cristina</creatorcontrib><creatorcontrib>Caccamo, Marco</creatorcontrib><creatorcontrib>Sangiovanni-Vincentelli, Alberto L.</creatorcontrib><title>Learning to Generate All Feasible Actions</title><title>IEEE access</title><addtitle>Access</addtitle><description>Modern cyber-physical systems are becoming increasingly complex to model, thus motivating data-driven techniques such as reinforcement learning (RL) to find appropriate control agents. However, most systems are subject to hard constraints such as safety or operational bounds. Typically, to learn to satisfy these constraints, the agent must violate them systematically, which is computationally prohibitive in most systems. Recent efforts aim to utilize feasibility models that assess whether a proposed action is feasible to avoid applying the agent's infeasible action proposals to the system. However, these efforts focus on guaranteeing constraint satisfaction rather than the agent's learning efficiency. To improve the learning process, we introduce action mapping, a novel approach that divides the learning process into two steps: first learn feasibility and subsequently, the objective by mapping actions into the sets of feasible actions. This paper focuses on the feasibility part by learning to generate all feasible actions through self-supervised querying of the feasibility model. We train the agent by formulating the problem as a distribution matching problem and deriving gradient estimators for different divergences. Through an illustrative example, a robotic path planning scenario, and a robotic grasping simulation, we demonstrate the agent's proficiency in generating actions across disconnected feasible action sets. By addressing the feasibility step, this paper makes it possible to focus future work on the objective part of action mapping, paving the way for an RL framework that is both safe and efficient.</description><subject>Action mapping</subject><subject>Bandwidth</subject><subject>Cyber-physical systems</subject><subject>Feasibility</subject><subject>Feasibility studies</subject><subject>generative neural network</subject><subject>Grasping</subject><subject>Grasping (robotics)</subject><subject>Learning</subject><subject>Mapping</subject><subject>Neural networks</subject><subject>Optimization</subject><subject>Proposals</subject><subject>Robots</subject><subject>Safety</subject><subject>Self-supervised learning</subject><subject>Training</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><sourceid>DOA</sourceid><recordid>eNpNkDFPwzAQhS0EElXpL4AhEhNDis927HisorZUqsRQmC3HsatUIS52OvDvcUmFeovPp3vvnj6EHgHPAbB8XVTVcrebE0zYnFLBBZU3aEKAy5wWlN9e9fdoFuMBpyrTqBAT9LK1OvRtv88Gn61tb4MebLboumxldWzrLn3M0Po-PqA7p7toZ5d3ij5Xy4_qLd--rzfVYpsbWsghd0Y6JiUGnAKRmpQFFYZJWjLDDVjg1HCOC6EbShgRAA0AMVg4YLKpiaVTtBl9G68P6hjaLx1-lNet-hv4sFc6DK3prLKOMBCOQe0wqzmWlEue7mJZyBpznbyeR69j8N8nGwd18KfQp_iKSJEC4IQhbdFxywQfY7Du_ypgdUasRsTqjFhdECfV06hqrbVXCiaAypL-As2ictg</recordid><startdate>2024</startdate><enddate>2024</enddate><creator>Theile, Mirco</creator><creator>Bernardini, Daniele</creator><creator>Trumpp, Raphael</creator><creator>Piazza, Cristina</creator><creator>Caccamo, Marco</creator><creator>Sangiovanni-Vincentelli, Alberto L.</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0003-1574-8858</orcidid><orcidid>https://orcid.org/0000-0002-0358-8677</orcidid><orcidid>https://orcid.org/0000-0003-2328-044X</orcidid><orcidid>https://orcid.org/0000-0002-9416-9557</orcidid><orcidid>https://orcid.org/0000-0003-3902-7916</orcidid><orcidid>https://orcid.org/0000-0003-1298-8389</orcidid></search><sort><creationdate>2024</creationdate><title>Learning to Generate All Feasible Actions</title><author>Theile, Mirco ; Bernardini, Daniele ; Trumpp, Raphael ; Piazza, Cristina ; Caccamo, Marco ; Sangiovanni-Vincentelli, Alberto L.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c359t-fc9f4990102022b28537c49384c6c1e163c66057ad3242711d112c07f149db2e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Action mapping</topic><topic>Bandwidth</topic><topic>Cyber-physical systems</topic><topic>Feasibility</topic><topic>Feasibility studies</topic><topic>generative neural network</topic><topic>Grasping</topic><topic>Grasping (robotics)</topic><topic>Learning</topic><topic>Mapping</topic><topic>Neural networks</topic><topic>Optimization</topic><topic>Proposals</topic><topic>Robots</topic><topic>Safety</topic><topic>Self-supervised learning</topic><topic>Training</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Theile, Mirco</creatorcontrib><creatorcontrib>Bernardini, Daniele</creatorcontrib><creatorcontrib>Trumpp, Raphael</creatorcontrib><creatorcontrib>Piazza, Cristina</creatorcontrib><creatorcontrib>Caccamo, Marco</creatorcontrib><creatorcontrib>Sangiovanni-Vincentelli, Alberto L.</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005–Present</collection><collection>IEEE Xplore Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998–Present</collection><collection>IEEE/IET Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Theile, Mirco</au><au>Bernardini, Daniele</au><au>Trumpp, Raphael</au><au>Piazza, Cristina</au><au>Caccamo, Marco</au><au>Sangiovanni-Vincentelli, Alberto L.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Learning to Generate All Feasible Actions</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2024</date><risdate>2024</risdate><volume>12</volume><spage>40668</spage><epage>40681</epage><pages>40668-40681</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>Modern cyber-physical systems are becoming increasingly complex to model, thus motivating data-driven techniques such as reinforcement learning (RL) to find appropriate control agents. However, most systems are subject to hard constraints such as safety or operational bounds. Typically, to learn to satisfy these constraints, the agent must violate them systematically, which is computationally prohibitive in most systems. Recent efforts aim to utilize feasibility models that assess whether a proposed action is feasible to avoid applying the agent's infeasible action proposals to the system. However, these efforts focus on guaranteeing constraint satisfaction rather than the agent's learning efficiency. To improve the learning process, we introduce action mapping, a novel approach that divides the learning process into two steps: first learn feasibility and subsequently, the objective by mapping actions into the sets of feasible actions. This paper focuses on the feasibility part by learning to generate all feasible actions through self-supervised querying of the feasibility model. We train the agent by formulating the problem as a distribution matching problem and deriving gradient estimators for different divergences. Through an illustrative example, a robotic path planning scenario, and a robotic grasping simulation, we demonstrate the agent's proficiency in generating actions across disconnected feasible action sets. By addressing the feasibility step, this paper makes it possible to focus future work on the objective part of action mapping, paving the way for an RL framework that is both safe and efficient.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2024.3376739</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0003-1574-8858</orcidid><orcidid>https://orcid.org/0000-0002-0358-8677</orcidid><orcidid>https://orcid.org/0000-0003-2328-044X</orcidid><orcidid>https://orcid.org/0000-0002-9416-9557</orcidid><orcidid>https://orcid.org/0000-0003-3902-7916</orcidid><orcidid>https://orcid.org/0000-0003-1298-8389</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 2169-3536 |
ispartof | IEEE access, 2024, Vol.12, p.40668-40681 |
issn | 2169-3536 2169-3536 |
language | eng |
recordid | cdi_proquest_journals_2973240169 |
source | Directory of Open Access Journals; IEEE Xplore Open Access Journals; EZB Electronic Journals Library |
subjects | Action mapping Bandwidth Cyber-physical systems Feasibility Feasibility studies generative neural network Grasping Grasping (robotics) Learning Mapping Neural networks Optimization Proposals Robots Safety Self-supervised learning Training |
title | Learning to Generate All Feasible Actions |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T19%3A42%3A19IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Learning%20to%20Generate%20All%20Feasible%20Actions&rft.jtitle=IEEE%20access&rft.au=Theile,%20Mirco&rft.date=2024&rft.volume=12&rft.spage=40668&rft.epage=40681&rft.pages=40668-40681&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2024.3376739&rft_dat=%3Cproquest_ieee_%3E2973240169%3C/proquest_ieee_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2973240169&rft_id=info:pmid/&rft_ieee_id=10471398&rft_doaj_id=oai_doaj_org_article_ef2417f41bf04b60936960200959b06a&rfr_iscdi=true |