Generative Upper-Level Policy Imitation Learning With Pareto-Improvement for Energy-Efficient Advanced Machining Systems
The potential intelligence behind advanced machining systems (AMSs) offers positive contributions toward process improvement. Imitation learning (IL) offers an appealing approach to accessing this intelligence by observing demonstrations from skilled technologists. However, existing IL algorithms th...
Gespeichert in:
Veröffentlicht in: | IEEE transaction on neural networks and learning systems 2024-03, Vol.PP, p.1-14 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 14 |
---|---|
container_issue | |
container_start_page | 1 |
container_title | IEEE transaction on neural networks and learning systems |
container_volume | PP |
creator | Xiao, Qinge Niu, Ben Tan, Ying Yang, Zhile Chen, Xingzheng |
description | The potential intelligence behind advanced machining systems (AMSs) offers positive contributions toward process improvement. Imitation learning (IL) offers an appealing approach to accessing this intelligence by observing demonstrations from skilled technologists. However, existing IL algorithms that implement single policy strategies have yet to consider realistic scenarios for complex AMS tasks, where the available demonstrations may have come from various experts. Moreover, most IL assumes that the expert's policy is optimal, preventing the learning from fulfilling the previously ignored green missions. This article introduces a novel three-phase policy search algorithm based on IL, enabling the learning of heterogeneous expert policies while balancing energy preferences. The first phase equips the agent with machining basics through upper-level policy learning, generating an imitation policy distribution with various decision-making principles. The second phase enhances energy conservation capabilities by employing Pareto-improvement learning and fine-tuning the agent's policies to a Pareto-policy manifold. The third phase produces outcomes and amplifies the efficacy of human feedback by utilizing ensemble policies. The experimental results indicate that the proposed method outperforms meta-heuristics, exhibiting superior solution quality and faster computation times compared to four diverse baseline methods, each with diverse samples. |
doi_str_mv | 10.1109/TNNLS.2024.3372641 |
format | Article |
fullrecord | <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_pubmed_primary_38478448</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10472292</ieee_id><sourcerecordid>2957163580</sourcerecordid><originalsourceid>FETCH-LOGICAL-c275t-20899c9c237c14f8caf0bfb36586d5e2850b9e379af796175b2e19686af05223</originalsourceid><addsrcrecordid>eNpNkMtOGzEYhS1EBYjyAqiqvGQzqS_j2xKhkEaaUiSCYDfyOL_BaC7BnkTN29chKao3tuzvHP3-ELqkZEIpMT8Wd3fVw4QRVk44V0yW9AidMSpZwbjWx59n9XyKLlJ6I3lJImRpTtAp16XSZanP0J8Z9BDtGDaAH1criEUFG2jx_dAGt8XzLoz5cehxBTb2oX_BT2F8xfc2wjgU824Vhw100I_YDxFPc9fLtph6H1zYXV4vN7Z3sMS_rHsNH_mHbRqhS1_RF2_bBBeH_RwtbqeLm59F9Xs2v7muCseUGAtGtDHOuPwRR0uvnfWk8Q2XQsulAKYFaQxwZaxXRlIlGgbUSC0zJxjj5-hqX5sHfV9DGusuJAdta3sY1qlmRigqudAko2yPujikFMHXqxg6G7c1JfXOef3hvN45rw_Oc-j7oX_ddLD8jPwznIFveyAAwH-NpWLMMP4XbpWG5g</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2957163580</pqid></control><display><type>article</type><title>Generative Upper-Level Policy Imitation Learning With Pareto-Improvement for Energy-Efficient Advanced Machining Systems</title><source>IEEE Electronic Library (IEL)</source><creator>Xiao, Qinge ; Niu, Ben ; Tan, Ying ; Yang, Zhile ; Chen, Xingzheng</creator><creatorcontrib>Xiao, Qinge ; Niu, Ben ; Tan, Ying ; Yang, Zhile ; Chen, Xingzheng</creatorcontrib><description>The potential intelligence behind advanced machining systems (AMSs) offers positive contributions toward process improvement. Imitation learning (IL) offers an appealing approach to accessing this intelligence by observing demonstrations from skilled technologists. However, existing IL algorithms that implement single policy strategies have yet to consider realistic scenarios for complex AMS tasks, where the available demonstrations may have come from various experts. Moreover, most IL assumes that the expert's policy is optimal, preventing the learning from fulfilling the previously ignored green missions. This article introduces a novel three-phase policy search algorithm based on IL, enabling the learning of heterogeneous expert policies while balancing energy preferences. The first phase equips the agent with machining basics through upper-level policy learning, generating an imitation policy distribution with various decision-making principles. The second phase enhances energy conservation capabilities by employing Pareto-improvement learning and fine-tuning the agent's policies to a Pareto-policy manifold. The third phase produces outcomes and amplifies the efficacy of human feedback by utilizing ensemble policies. The experimental results indicate that the proposed method outperforms meta-heuristics, exhibiting superior solution quality and faster computation times compared to four diverse baseline methods, each with diverse samples.</description><identifier>ISSN: 2162-237X</identifier><identifier>EISSN: 2162-2388</identifier><identifier>DOI: 10.1109/TNNLS.2024.3372641</identifier><identifier>PMID: 38478448</identifier><identifier>CODEN: ITNNAL</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>Advanced machining system (AMS) ; Decision making ; Energy efficiency ; heterogeneous policy search ; imitation learning (IL) ; Machining ; Mathematical models ; Metaheuristics ; Optimization ; pareto-improvement ; Task analysis</subject><ispartof>IEEE transaction on neural networks and learning systems, 2024-03, Vol.PP, p.1-14</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed><orcidid>0000-0003-1235-073X ; 0000-0001-5822-8743 ; 0000-0001-8580-534X ; 0000-0001-8243-4731</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10472292$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>315,781,785,797,27929,27930,54763</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10472292$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/38478448$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Xiao, Qinge</creatorcontrib><creatorcontrib>Niu, Ben</creatorcontrib><creatorcontrib>Tan, Ying</creatorcontrib><creatorcontrib>Yang, Zhile</creatorcontrib><creatorcontrib>Chen, Xingzheng</creatorcontrib><title>Generative Upper-Level Policy Imitation Learning With Pareto-Improvement for Energy-Efficient Advanced Machining Systems</title><title>IEEE transaction on neural networks and learning systems</title><addtitle>TNNLS</addtitle><addtitle>IEEE Trans Neural Netw Learn Syst</addtitle><description>The potential intelligence behind advanced machining systems (AMSs) offers positive contributions toward process improvement. Imitation learning (IL) offers an appealing approach to accessing this intelligence by observing demonstrations from skilled technologists. However, existing IL algorithms that implement single policy strategies have yet to consider realistic scenarios for complex AMS tasks, where the available demonstrations may have come from various experts. Moreover, most IL assumes that the expert's policy is optimal, preventing the learning from fulfilling the previously ignored green missions. This article introduces a novel three-phase policy search algorithm based on IL, enabling the learning of heterogeneous expert policies while balancing energy preferences. The first phase equips the agent with machining basics through upper-level policy learning, generating an imitation policy distribution with various decision-making principles. The second phase enhances energy conservation capabilities by employing Pareto-improvement learning and fine-tuning the agent's policies to a Pareto-policy manifold. The third phase produces outcomes and amplifies the efficacy of human feedback by utilizing ensemble policies. The experimental results indicate that the proposed method outperforms meta-heuristics, exhibiting superior solution quality and faster computation times compared to four diverse baseline methods, each with diverse samples.</description><subject>Advanced machining system (AMS)</subject><subject>Decision making</subject><subject>Energy efficiency</subject><subject>heterogeneous policy search</subject><subject>imitation learning (IL)</subject><subject>Machining</subject><subject>Mathematical models</subject><subject>Metaheuristics</subject><subject>Optimization</subject><subject>pareto-improvement</subject><subject>Task analysis</subject><issn>2162-237X</issn><issn>2162-2388</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNkMtOGzEYhS1EBYjyAqiqvGQzqS_j2xKhkEaaUiSCYDfyOL_BaC7BnkTN29chKao3tuzvHP3-ELqkZEIpMT8Wd3fVw4QRVk44V0yW9AidMSpZwbjWx59n9XyKLlJ6I3lJImRpTtAp16XSZanP0J8Z9BDtGDaAH1criEUFG2jx_dAGt8XzLoz5cehxBTb2oX_BT2F8xfc2wjgU824Vhw100I_YDxFPc9fLtph6H1zYXV4vN7Z3sMS_rHsNH_mHbRqhS1_RF2_bBBeH_RwtbqeLm59F9Xs2v7muCseUGAtGtDHOuPwRR0uvnfWk8Q2XQsulAKYFaQxwZaxXRlIlGgbUSC0zJxjj5-hqX5sHfV9DGusuJAdta3sY1qlmRigqudAko2yPujikFMHXqxg6G7c1JfXOef3hvN45rw_Oc-j7oX_ddLD8jPwznIFveyAAwH-NpWLMMP4XbpWG5g</recordid><startdate>20240318</startdate><enddate>20240318</enddate><creator>Xiao, Qinge</creator><creator>Niu, Ben</creator><creator>Tan, Ying</creator><creator>Yang, Zhile</creator><creator>Chen, Xingzheng</creator><general>IEEE</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0003-1235-073X</orcidid><orcidid>https://orcid.org/0000-0001-5822-8743</orcidid><orcidid>https://orcid.org/0000-0001-8580-534X</orcidid><orcidid>https://orcid.org/0000-0001-8243-4731</orcidid></search><sort><creationdate>20240318</creationdate><title>Generative Upper-Level Policy Imitation Learning With Pareto-Improvement for Energy-Efficient Advanced Machining Systems</title><author>Xiao, Qinge ; Niu, Ben ; Tan, Ying ; Yang, Zhile ; Chen, Xingzheng</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c275t-20899c9c237c14f8caf0bfb36586d5e2850b9e379af796175b2e19686af05223</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Advanced machining system (AMS)</topic><topic>Decision making</topic><topic>Energy efficiency</topic><topic>heterogeneous policy search</topic><topic>imitation learning (IL)</topic><topic>Machining</topic><topic>Mathematical models</topic><topic>Metaheuristics</topic><topic>Optimization</topic><topic>pareto-improvement</topic><topic>Task analysis</topic><toplevel>online_resources</toplevel><creatorcontrib>Xiao, Qinge</creatorcontrib><creatorcontrib>Niu, Ben</creatorcontrib><creatorcontrib>Tan, Ying</creatorcontrib><creatorcontrib>Yang, Zhile</creatorcontrib><creatorcontrib>Chen, Xingzheng</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE transaction on neural networks and learning systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Xiao, Qinge</au><au>Niu, Ben</au><au>Tan, Ying</au><au>Yang, Zhile</au><au>Chen, Xingzheng</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Generative Upper-Level Policy Imitation Learning With Pareto-Improvement for Energy-Efficient Advanced Machining Systems</atitle><jtitle>IEEE transaction on neural networks and learning systems</jtitle><stitle>TNNLS</stitle><addtitle>IEEE Trans Neural Netw Learn Syst</addtitle><date>2024-03-18</date><risdate>2024</risdate><volume>PP</volume><spage>1</spage><epage>14</epage><pages>1-14</pages><issn>2162-237X</issn><eissn>2162-2388</eissn><coden>ITNNAL</coden><abstract>The potential intelligence behind advanced machining systems (AMSs) offers positive contributions toward process improvement. Imitation learning (IL) offers an appealing approach to accessing this intelligence by observing demonstrations from skilled technologists. However, existing IL algorithms that implement single policy strategies have yet to consider realistic scenarios for complex AMS tasks, where the available demonstrations may have come from various experts. Moreover, most IL assumes that the expert's policy is optimal, preventing the learning from fulfilling the previously ignored green missions. This article introduces a novel three-phase policy search algorithm based on IL, enabling the learning of heterogeneous expert policies while balancing energy preferences. The first phase equips the agent with machining basics through upper-level policy learning, generating an imitation policy distribution with various decision-making principles. The second phase enhances energy conservation capabilities by employing Pareto-improvement learning and fine-tuning the agent's policies to a Pareto-policy manifold. The third phase produces outcomes and amplifies the efficacy of human feedback by utilizing ensemble policies. The experimental results indicate that the proposed method outperforms meta-heuristics, exhibiting superior solution quality and faster computation times compared to four diverse baseline methods, each with diverse samples.</abstract><cop>United States</cop><pub>IEEE</pub><pmid>38478448</pmid><doi>10.1109/TNNLS.2024.3372641</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0003-1235-073X</orcidid><orcidid>https://orcid.org/0000-0001-5822-8743</orcidid><orcidid>https://orcid.org/0000-0001-8580-534X</orcidid><orcidid>https://orcid.org/0000-0001-8243-4731</orcidid></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 2162-237X |
ispartof | IEEE transaction on neural networks and learning systems, 2024-03, Vol.PP, p.1-14 |
issn | 2162-237X 2162-2388 |
language | eng |
recordid | cdi_pubmed_primary_38478448 |
source | IEEE Electronic Library (IEL) |
subjects | Advanced machining system (AMS) Decision making Energy efficiency heterogeneous policy search imitation learning (IL) Machining Mathematical models Metaheuristics Optimization pareto-improvement Task analysis |
title | Generative Upper-Level Policy Imitation Learning With Pareto-Improvement for Energy-Efficient Advanced Machining Systems |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-15T04%3A49%3A07IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Generative%20Upper-Level%20Policy%20Imitation%20Learning%20With%20Pareto-Improvement%20for%20Energy-Efficient%20Advanced%20Machining%20Systems&rft.jtitle=IEEE%20transaction%20on%20neural%20networks%20and%20learning%20systems&rft.au=Xiao,%20Qinge&rft.date=2024-03-18&rft.volume=PP&rft.spage=1&rft.epage=14&rft.pages=1-14&rft.issn=2162-237X&rft.eissn=2162-2388&rft.coden=ITNNAL&rft_id=info:doi/10.1109/TNNLS.2024.3372641&rft_dat=%3Cproquest_RIE%3E2957163580%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2957163580&rft_id=info:pmid/38478448&rft_ieee_id=10472292&rfr_iscdi=true |