Learning to Generate All Feasible Actions

Modern cyber-physical systems are becoming increasingly complex to model, thus motivating data-driven techniques such as reinforcement learning (RL) to find appropriate control agents. However, most systems are subject to hard constraints such as safety or operational bounds. Typically, to learn to...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE access 2024, Vol.12, p.40668-40681
Hauptverfasser:	Theile, Mirco, Bernardini, Daniele, Trumpp, Raphael, Piazza, Cristina, Caccamo, Marco, Sangiovanni-Vincentelli, Alberto L.
Format:	Artikel
Sprache:	eng
Schlagworte:	Action mapping Bandwidth Cyber-physical systems Feasibility Feasibility studies generative neural network Grasping Grasping (robotics) Learning Mapping Neural networks Optimization Proposals Robots Safety Self-supervised learning Training
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	40681
container_issue
container_start_page	40668
container_title	IEEE access
container_volume	12
creator	Theile, Mirco Bernardini, Daniele Trumpp, Raphael Piazza, Cristina Caccamo, Marco Sangiovanni-Vincentelli, Alberto L.
description	Modern cyber-physical systems are becoming increasingly complex to model, thus motivating data-driven techniques such as reinforcement learning (RL) to find appropriate control agents. However, most systems are subject to hard constraints such as safety or operational bounds. Typically, to learn to satisfy these constraints, the agent must violate them systematically, which is computationally prohibitive in most systems. Recent efforts aim to utilize feasibility models that assess whether a proposed action is feasible to avoid applying the agent's infeasible action proposals to the system. However, these efforts focus on guaranteeing constraint satisfaction rather than the agent's learning efficiency. To improve the learning process, we introduce action mapping, a novel approach that divides the learning process into two steps: first learn feasibility and subsequently, the objective by mapping actions into the sets of feasible actions. This paper focuses on the feasibility part by learning to generate all feasible actions through self-supervised querying of the feasibility model. We train the agent by formulating the problem as a distribution matching problem and deriving gradient estimators for different divergences. Through an illustrative example, a robotic path planning scenario, and a robotic grasping simulation, we demonstrate the agent's proficiency in generating actions across disconnected feasible action sets. By addressing the feasibility step, this paper makes it possible to focus future work on the objective part of action mapping, paving the way for an RL framework that is both safe and efficient.
doi_str_mv	10.1109/ACCESS.2024.3376739
format	Article
fullrecord	<record><control><sourceid>proquest_ieee_</sourceid><recordid>TN_cdi_proquest_journals_2973240169</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10471398</ieee_id><doaj_id>oai_doaj_org_article_ef2417f41bf04b60936960200959b06a</doaj_id><sourcerecordid>2973240169</sourcerecordid><originalsourceid>FETCH-LOGICAL-c359t-fc9f4990102022b28537c49384c6c1e163c66057ad3242711d112c07f149db2e3</originalsourceid><addsrcrecordid>eNpNkDFPwzAQhS0EElXpL4AhEhNDis927HisorZUqsRQmC3HsatUIS52OvDvcUmFeovPp3vvnj6EHgHPAbB8XVTVcrebE0zYnFLBBZU3aEKAy5wWlN9e9fdoFuMBpyrTqBAT9LK1OvRtv88Gn61tb4MebLboumxldWzrLn3M0Po-PqA7p7toZ5d3ij5Xy4_qLd--rzfVYpsbWsghd0Y6JiUGnAKRmpQFFYZJWjLDDVjg1HCOC6EbShgRAA0AMVg4YLKpiaVTtBl9G68P6hjaLx1-lNet-hv4sFc6DK3prLKOMBCOQe0wqzmWlEue7mJZyBpznbyeR69j8N8nGwd18KfQp_iKSJEC4IQhbdFxywQfY7Du_ypgdUasRsTqjFhdECfV06hqrbVXCiaAypL-As2ictg</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2973240169</pqid></control><display><type>article</type><title>Learning to Generate All Feasible Actions</title><source>Directory of Open Access Journals</source><source>IEEE Xplore Open Access Journals</source><source>EZB Electronic Journals Library</source><creator>Theile, Mirco ; Bernardini, Daniele ; Trumpp, Raphael ; Piazza, Cristina ; Caccamo, Marco ; Sangiovanni-Vincentelli, Alberto L.</creator><creatorcontrib>Theile, Mirco ; Bernardini, Daniele ; Trumpp, Raphael ; Piazza, Cristina ; Caccamo, Marco ; Sangiovanni-Vincentelli, Alberto L.</creatorcontrib><description>Modern cyber-physical systems are becoming increasingly complex to model, thus motivating data-driven techniques such as reinforcement learning (RL) to find appropriate control agents. However, most systems are subject to hard constraints such as safety or operational bounds. Typically, to learn to satisfy these constraints, the agent must violate them systematically, which is computationally prohibitive in most systems. Recent efforts aim to utilize feasibility models that assess whether a proposed action is feasible to avoid applying the agent's infeasible action proposals to the system. However, these efforts focus on guaranteeing constraint satisfaction rather than the agent's learning efficiency. To improve the learning process, we introduce action mapping, a novel approach that divides the learning process into two steps: first learn feasibility and subsequently, the objective by mapping actions into the sets of feasible actions. This paper focuses on the feasibility part by learning to generate all feasible actions through self-supervised querying of the feasibility model. We train the agent by formulating the problem as a distribution matching problem and deriving gradient estimators for different divergences. Through an illustrative example, a robotic path planning scenario, and a robotic grasping simulation, we demonstrate the agent's proficiency in generating actions across disconnected feasible action sets. By addressing the feasibility step, this paper makes it possible to focus future work on the objective part of action mapping, paving the way for an RL framework that is both safe and efficient.</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2024.3376739</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Action mapping ; Bandwidth ; Cyber-physical systems ; Feasibility ; Feasibility studies ; generative neural network ; Grasping ; Grasping (robotics) ; Learning ; Mapping ; Neural networks ; Optimization ; Proposals ; Robots ; Safety ; Self-supervised learning ; Training</subject><ispartof>IEEE access, 2024, Vol.12, p.40668-40681</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c359t-fc9f4990102022b28537c49384c6c1e163c66057ad3242711d112c07f149db2e3</cites><orcidid>0000-0003-1574-8858 ; 0000-0002-0358-8677 ; 0000-0003-2328-044X ; 0000-0002-9416-9557 ; 0000-0003-3902-7916 ; 0000-0003-1298-8389</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10471398$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,780,784,864,2102,4024,27633,27923,27924,27925,54933</link.rule.ids></links><search><creatorcontrib>Theile, Mirco</creatorcontrib><creatorcontrib>Bernardini, Daniele</creatorcontrib><creatorcontrib>Trumpp, Raphael</creatorcontrib><creatorcontrib>Piazza, Cristina</creatorcontrib><creatorcontrib>Caccamo, Marco</creatorcontrib><creatorcontrib>Sangiovanni-Vincentelli, Alberto L.</creatorcontrib><title>Learning to Generate All Feasible Actions</title><title>IEEE access</title><addtitle>Access</addtitle><description>Modern cyber-physical systems are becoming increasingly complex to model, thus motivating data-driven techniques such as reinforcement learning (RL) to find appropriate control agents. However, most systems are subject to hard constraints such as safety or operational bounds. Typically, to learn to satisfy these constraints, the agent must violate them systematically, which is computationally prohibitive in most systems. Recent efforts aim to utilize feasibility models that assess whether a proposed action is feasible to avoid applying the agent's infeasible action proposals to the system. However, these efforts focus on guaranteeing constraint satisfaction rather than the agent's learning efficiency. To improve the learning process, we introduce action mapping, a novel approach that divides the learning process into two steps: first learn feasibility and subsequently, the objective by mapping actions into the sets of feasible actions. This paper focuses on the feasibility part by learning to generate all feasible actions through self-supervised querying of the feasibility model. We train the agent by formulating the problem as a distribution matching problem and deriving gradient estimators for different divergences. Through an illustrative example, a robotic path planning scenario, and a robotic grasping simulation, we demonstrate the agent's proficiency in generating actions across disconnected feasible action sets. By addressing the feasibility step, this paper makes it possible to focus future work on the objective part of action mapping, paving the way for an RL framework that is both safe and efficient.</description><subject>Action mapping</subject><subject>Bandwidth</subject><subject>Cyber-physical systems</subject><subject>Feasibility</subject><subject>Feasibility studies</subject><subject>generative neural network</subject><subject>Grasping</subject><subject>Grasping (robotics)</subject><subject>Learning</subject><subject>Mapping</subject><subject>Neural networks</subject><subject>Optimization</subject><subject>Proposals</subject><subject>Robots</subject><subject>Safety</subject><subject>Self-supervised learning</subject><subject>Training</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><sourceid>DOA</sourceid><recordid>eNpNkDFPwzAQhS0EElXpL4AhEhNDis927HisorZUqsRQmC3HsatUIS52OvDvcUmFeovPp3vvnj6EHgHPAbB8XVTVcrebE0zYnFLBBZU3aEKAy5wWlN9e9fdoFuMBpyrTqBAT9LK1OvRtv88Gn61tb4MebLboumxldWzrLn3M0Po-PqA7p7toZ5d3ij5Xy4_qLd--rzfVYpsbWsghd0Y6JiUGnAKRmpQFFYZJWjLDDVjg1HCOC6EbShgRAA0AMVg4YLKpiaVTtBl9G68P6hjaLx1-lNet-hv4sFc6DK3prLKOMBCOQe0wqzmWlEue7mJZyBpznbyeR69j8N8nGwd18KfQp_iKSJEC4IQhbdFxywQfY7Du_ypgdUasRsTqjFhdECfV06hqrbVXCiaAypL-As2ictg</recordid><startdate>2024</startdate><enddate>2024</enddate><creator>Theile, Mirco</creator><creator>Bernardini, Daniele</creator><creator>Trumpp, Raphael</creator><creator>Piazza, Cristina</creator><creator>Caccamo, Marco</creator><creator>Sangiovanni-Vincentelli, Alberto L.</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0003-1574-8858</orcidid><orcidid>https://orcid.org/0000-0002-0358-8677</orcidid><orcidid>https://orcid.org/0000-0003-2328-044X</orcidid><orcidid>https://orcid.org/0000-0002-9416-9557</orcidid><orcidid>https://orcid.org/0000-0003-3902-7916</orcidid><orcidid>https://orcid.org/0000-0003-1298-8389</orcidid></search><sort><creationdate>2024</creationdate><title>Learning to Generate All Feasible Actions</title><author>Theile, Mirco ; Bernardini, Daniele ; Trumpp, Raphael ; Piazza, Cristina ; Caccamo, Marco ; Sangiovanni-Vincentelli, Alberto L.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c359t-fc9f4990102022b28537c49384c6c1e163c66057ad3242711d112c07f149db2e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Action mapping</topic><topic>Bandwidth</topic><topic>Cyber-physical systems</topic><topic>Feasibility</topic><topic>Feasibility studies</topic><topic>generative neural network</topic><topic>Grasping</topic><topic>Grasping (robotics)</topic><topic>Learning</topic><topic>Mapping</topic><topic>Neural networks</topic><topic>Optimization</topic><topic>Proposals</topic><topic>Robots</topic><topic>Safety</topic><topic>Self-supervised learning</topic><topic>Training</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Theile, Mirco</creatorcontrib><creatorcontrib>Bernardini, Daniele</creatorcontrib><creatorcontrib>Trumpp, Raphael</creatorcontrib><creatorcontrib>Piazza, Cristina</creatorcontrib><creatorcontrib>Caccamo, Marco</creatorcontrib><creatorcontrib>Sangiovanni-Vincentelli, Alberto L.</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005–Present</collection><collection>IEEE Xplore Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998–Present</collection><collection>IEEE/IET Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Theile, Mirco</au><au>Bernardini, Daniele</au><au>Trumpp, Raphael</au><au>Piazza, Cristina</au><au>Caccamo, Marco</au><au>Sangiovanni-Vincentelli, Alberto L.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Learning to Generate All Feasible Actions</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2024</date><risdate>2024</risdate><volume>12</volume><spage>40668</spage><epage>40681</epage><pages>40668-40681</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>Modern cyber-physical systems are becoming increasingly complex to model, thus motivating data-driven techniques such as reinforcement learning (RL) to find appropriate control agents. However, most systems are subject to hard constraints such as safety or operational bounds. Typically, to learn to satisfy these constraints, the agent must violate them systematically, which is computationally prohibitive in most systems. Recent efforts aim to utilize feasibility models that assess whether a proposed action is feasible to avoid applying the agent's infeasible action proposals to the system. However, these efforts focus on guaranteeing constraint satisfaction rather than the agent's learning efficiency. To improve the learning process, we introduce action mapping, a novel approach that divides the learning process into two steps: first learn feasibility and subsequently, the objective by mapping actions into the sets of feasible actions. This paper focuses on the feasibility part by learning to generate all feasible actions through self-supervised querying of the feasibility model. We train the agent by formulating the problem as a distribution matching problem and deriving gradient estimators for different divergences. Through an illustrative example, a robotic path planning scenario, and a robotic grasping simulation, we demonstrate the agent's proficiency in generating actions across disconnected feasible action sets. By addressing the feasibility step, this paper makes it possible to focus future work on the objective part of action mapping, paving the way for an RL framework that is both safe and efficient.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2024.3376739</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0003-1574-8858</orcidid><orcidid>https://orcid.org/0000-0002-0358-8677</orcidid><orcidid>https://orcid.org/0000-0003-2328-044X</orcidid><orcidid>https://orcid.org/0000-0002-9416-9557</orcidid><orcidid>https://orcid.org/0000-0003-3902-7916</orcidid><orcidid>https://orcid.org/0000-0003-1298-8389</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 2169-3536
ispartof	IEEE access, 2024, Vol.12, p.40668-40681
issn	2169-3536 2169-3536
language	eng
recordid	cdi_proquest_journals_2973240169
source	Directory of Open Access Journals; IEEE Xplore Open Access Journals; EZB Electronic Journals Library
subjects	Action mapping Bandwidth Cyber-physical systems Feasibility Feasibility studies generative neural network Grasping Grasping (robotics) Learning Mapping Neural networks Optimization Proposals Robots Safety Self-supervised learning Training
title	Learning to Generate All Feasible Actions
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T19%3A42%3A19IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Learning%20to%20Generate%20All%20Feasible%20Actions&rft.jtitle=IEEE%20access&rft.au=Theile,%20Mirco&rft.date=2024&rft.volume=12&rft.spage=40668&rft.epage=40681&rft.pages=40668-40681&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2024.3376739&rft_dat=%3Cproquest_ieee_%3E2973240169%3C/proquest_ieee_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2973240169&rft_id=info:pmid/&rft_ieee_id=10471398&rft_doaj_id=oai_doaj_org_article_ef2417f41bf04b60936960200959b06a&rfr_iscdi=true