Generative Upper-Level Policy Imitation Learning With Pareto-Improvement for Energy-Efficient Advanced Machining Systems

The potential intelligence behind advanced machining systems (AMSs) offers positive contributions toward process improvement. Imitation learning (IL) offers an appealing approach to accessing this intelligence by observing demonstrations from skilled technologists. However, existing IL algorithms th...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transaction on neural networks and learning systems 2024-03, Vol.PP, p.1-14
Hauptverfasser: Xiao, Qinge, Niu, Ben, Tan, Ying, Yang, Zhile, Chen, Xingzheng
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 14
container_issue
container_start_page 1
container_title IEEE transaction on neural networks and learning systems
container_volume PP
creator Xiao, Qinge
Niu, Ben
Tan, Ying
Yang, Zhile
Chen, Xingzheng
description The potential intelligence behind advanced machining systems (AMSs) offers positive contributions toward process improvement. Imitation learning (IL) offers an appealing approach to accessing this intelligence by observing demonstrations from skilled technologists. However, existing IL algorithms that implement single policy strategies have yet to consider realistic scenarios for complex AMS tasks, where the available demonstrations may have come from various experts. Moreover, most IL assumes that the expert's policy is optimal, preventing the learning from fulfilling the previously ignored green missions. This article introduces a novel three-phase policy search algorithm based on IL, enabling the learning of heterogeneous expert policies while balancing energy preferences. The first phase equips the agent with machining basics through upper-level policy learning, generating an imitation policy distribution with various decision-making principles. The second phase enhances energy conservation capabilities by employing Pareto-improvement learning and fine-tuning the agent's policies to a Pareto-policy manifold. The third phase produces outcomes and amplifies the efficacy of human feedback by utilizing ensemble policies. The experimental results indicate that the proposed method outperforms meta-heuristics, exhibiting superior solution quality and faster computation times compared to four diverse baseline methods, each with diverse samples.
doi_str_mv 10.1109/TNNLS.2024.3372641
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_pubmed_primary_38478448</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10472292</ieee_id><sourcerecordid>2957163580</sourcerecordid><originalsourceid>FETCH-LOGICAL-c275t-20899c9c237c14f8caf0bfb36586d5e2850b9e379af796175b2e19686af05223</originalsourceid><addsrcrecordid>eNpNkMtOGzEYhS1EBYjyAqiqvGQzqS_j2xKhkEaaUiSCYDfyOL_BaC7BnkTN29chKao3tuzvHP3-ELqkZEIpMT8Wd3fVw4QRVk44V0yW9AidMSpZwbjWx59n9XyKLlJ6I3lJImRpTtAp16XSZanP0J8Z9BDtGDaAH1criEUFG2jx_dAGt8XzLoz5cehxBTb2oX_BT2F8xfc2wjgU824Vhw100I_YDxFPc9fLtph6H1zYXV4vN7Z3sMS_rHsNH_mHbRqhS1_RF2_bBBeH_RwtbqeLm59F9Xs2v7muCseUGAtGtDHOuPwRR0uvnfWk8Q2XQsulAKYFaQxwZaxXRlIlGgbUSC0zJxjj5-hqX5sHfV9DGusuJAdta3sY1qlmRigqudAko2yPujikFMHXqxg6G7c1JfXOef3hvN45rw_Oc-j7oX_ddLD8jPwznIFveyAAwH-NpWLMMP4XbpWG5g</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2957163580</pqid></control><display><type>article</type><title>Generative Upper-Level Policy Imitation Learning With Pareto-Improvement for Energy-Efficient Advanced Machining Systems</title><source>IEEE Electronic Library (IEL)</source><creator>Xiao, Qinge ; Niu, Ben ; Tan, Ying ; Yang, Zhile ; Chen, Xingzheng</creator><creatorcontrib>Xiao, Qinge ; Niu, Ben ; Tan, Ying ; Yang, Zhile ; Chen, Xingzheng</creatorcontrib><description>The potential intelligence behind advanced machining systems (AMSs) offers positive contributions toward process improvement. Imitation learning (IL) offers an appealing approach to accessing this intelligence by observing demonstrations from skilled technologists. However, existing IL algorithms that implement single policy strategies have yet to consider realistic scenarios for complex AMS tasks, where the available demonstrations may have come from various experts. Moreover, most IL assumes that the expert's policy is optimal, preventing the learning from fulfilling the previously ignored green missions. This article introduces a novel three-phase policy search algorithm based on IL, enabling the learning of heterogeneous expert policies while balancing energy preferences. The first phase equips the agent with machining basics through upper-level policy learning, generating an imitation policy distribution with various decision-making principles. The second phase enhances energy conservation capabilities by employing Pareto-improvement learning and fine-tuning the agent's policies to a Pareto-policy manifold. The third phase produces outcomes and amplifies the efficacy of human feedback by utilizing ensemble policies. The experimental results indicate that the proposed method outperforms meta-heuristics, exhibiting superior solution quality and faster computation times compared to four diverse baseline methods, each with diverse samples.</description><identifier>ISSN: 2162-237X</identifier><identifier>EISSN: 2162-2388</identifier><identifier>DOI: 10.1109/TNNLS.2024.3372641</identifier><identifier>PMID: 38478448</identifier><identifier>CODEN: ITNNAL</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>Advanced machining system (AMS) ; Decision making ; Energy efficiency ; heterogeneous policy search ; imitation learning (IL) ; Machining ; Mathematical models ; Metaheuristics ; Optimization ; pareto-improvement ; Task analysis</subject><ispartof>IEEE transaction on neural networks and learning systems, 2024-03, Vol.PP, p.1-14</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed><orcidid>0000-0003-1235-073X ; 0000-0001-5822-8743 ; 0000-0001-8580-534X ; 0000-0001-8243-4731</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10472292$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>315,781,785,797,27929,27930,54763</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10472292$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/38478448$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Xiao, Qinge</creatorcontrib><creatorcontrib>Niu, Ben</creatorcontrib><creatorcontrib>Tan, Ying</creatorcontrib><creatorcontrib>Yang, Zhile</creatorcontrib><creatorcontrib>Chen, Xingzheng</creatorcontrib><title>Generative Upper-Level Policy Imitation Learning With Pareto-Improvement for Energy-Efficient Advanced Machining Systems</title><title>IEEE transaction on neural networks and learning systems</title><addtitle>TNNLS</addtitle><addtitle>IEEE Trans Neural Netw Learn Syst</addtitle><description>The potential intelligence behind advanced machining systems (AMSs) offers positive contributions toward process improvement. Imitation learning (IL) offers an appealing approach to accessing this intelligence by observing demonstrations from skilled technologists. However, existing IL algorithms that implement single policy strategies have yet to consider realistic scenarios for complex AMS tasks, where the available demonstrations may have come from various experts. Moreover, most IL assumes that the expert's policy is optimal, preventing the learning from fulfilling the previously ignored green missions. This article introduces a novel three-phase policy search algorithm based on IL, enabling the learning of heterogeneous expert policies while balancing energy preferences. The first phase equips the agent with machining basics through upper-level policy learning, generating an imitation policy distribution with various decision-making principles. The second phase enhances energy conservation capabilities by employing Pareto-improvement learning and fine-tuning the agent's policies to a Pareto-policy manifold. The third phase produces outcomes and amplifies the efficacy of human feedback by utilizing ensemble policies. The experimental results indicate that the proposed method outperforms meta-heuristics, exhibiting superior solution quality and faster computation times compared to four diverse baseline methods, each with diverse samples.</description><subject>Advanced machining system (AMS)</subject><subject>Decision making</subject><subject>Energy efficiency</subject><subject>heterogeneous policy search</subject><subject>imitation learning (IL)</subject><subject>Machining</subject><subject>Mathematical models</subject><subject>Metaheuristics</subject><subject>Optimization</subject><subject>pareto-improvement</subject><subject>Task analysis</subject><issn>2162-237X</issn><issn>2162-2388</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNkMtOGzEYhS1EBYjyAqiqvGQzqS_j2xKhkEaaUiSCYDfyOL_BaC7BnkTN29chKao3tuzvHP3-ELqkZEIpMT8Wd3fVw4QRVk44V0yW9AidMSpZwbjWx59n9XyKLlJ6I3lJImRpTtAp16XSZanP0J8Z9BDtGDaAH1criEUFG2jx_dAGt8XzLoz5cehxBTb2oX_BT2F8xfc2wjgU824Vhw100I_YDxFPc9fLtph6H1zYXV4vN7Z3sMS_rHsNH_mHbRqhS1_RF2_bBBeH_RwtbqeLm59F9Xs2v7muCseUGAtGtDHOuPwRR0uvnfWk8Q2XQsulAKYFaQxwZaxXRlIlGgbUSC0zJxjj5-hqX5sHfV9DGusuJAdta3sY1qlmRigqudAko2yPujikFMHXqxg6G7c1JfXOef3hvN45rw_Oc-j7oX_ddLD8jPwznIFveyAAwH-NpWLMMP4XbpWG5g</recordid><startdate>20240318</startdate><enddate>20240318</enddate><creator>Xiao, Qinge</creator><creator>Niu, Ben</creator><creator>Tan, Ying</creator><creator>Yang, Zhile</creator><creator>Chen, Xingzheng</creator><general>IEEE</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0003-1235-073X</orcidid><orcidid>https://orcid.org/0000-0001-5822-8743</orcidid><orcidid>https://orcid.org/0000-0001-8580-534X</orcidid><orcidid>https://orcid.org/0000-0001-8243-4731</orcidid></search><sort><creationdate>20240318</creationdate><title>Generative Upper-Level Policy Imitation Learning With Pareto-Improvement for Energy-Efficient Advanced Machining Systems</title><author>Xiao, Qinge ; Niu, Ben ; Tan, Ying ; Yang, Zhile ; Chen, Xingzheng</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c275t-20899c9c237c14f8caf0bfb36586d5e2850b9e379af796175b2e19686af05223</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Advanced machining system (AMS)</topic><topic>Decision making</topic><topic>Energy efficiency</topic><topic>heterogeneous policy search</topic><topic>imitation learning (IL)</topic><topic>Machining</topic><topic>Mathematical models</topic><topic>Metaheuristics</topic><topic>Optimization</topic><topic>pareto-improvement</topic><topic>Task analysis</topic><toplevel>online_resources</toplevel><creatorcontrib>Xiao, Qinge</creatorcontrib><creatorcontrib>Niu, Ben</creatorcontrib><creatorcontrib>Tan, Ying</creatorcontrib><creatorcontrib>Yang, Zhile</creatorcontrib><creatorcontrib>Chen, Xingzheng</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE transaction on neural networks and learning systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Xiao, Qinge</au><au>Niu, Ben</au><au>Tan, Ying</au><au>Yang, Zhile</au><au>Chen, Xingzheng</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Generative Upper-Level Policy Imitation Learning With Pareto-Improvement for Energy-Efficient Advanced Machining Systems</atitle><jtitle>IEEE transaction on neural networks and learning systems</jtitle><stitle>TNNLS</stitle><addtitle>IEEE Trans Neural Netw Learn Syst</addtitle><date>2024-03-18</date><risdate>2024</risdate><volume>PP</volume><spage>1</spage><epage>14</epage><pages>1-14</pages><issn>2162-237X</issn><eissn>2162-2388</eissn><coden>ITNNAL</coden><abstract>The potential intelligence behind advanced machining systems (AMSs) offers positive contributions toward process improvement. Imitation learning (IL) offers an appealing approach to accessing this intelligence by observing demonstrations from skilled technologists. However, existing IL algorithms that implement single policy strategies have yet to consider realistic scenarios for complex AMS tasks, where the available demonstrations may have come from various experts. Moreover, most IL assumes that the expert's policy is optimal, preventing the learning from fulfilling the previously ignored green missions. This article introduces a novel three-phase policy search algorithm based on IL, enabling the learning of heterogeneous expert policies while balancing energy preferences. The first phase equips the agent with machining basics through upper-level policy learning, generating an imitation policy distribution with various decision-making principles. The second phase enhances energy conservation capabilities by employing Pareto-improvement learning and fine-tuning the agent's policies to a Pareto-policy manifold. The third phase produces outcomes and amplifies the efficacy of human feedback by utilizing ensemble policies. The experimental results indicate that the proposed method outperforms meta-heuristics, exhibiting superior solution quality and faster computation times compared to four diverse baseline methods, each with diverse samples.</abstract><cop>United States</cop><pub>IEEE</pub><pmid>38478448</pmid><doi>10.1109/TNNLS.2024.3372641</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0003-1235-073X</orcidid><orcidid>https://orcid.org/0000-0001-5822-8743</orcidid><orcidid>https://orcid.org/0000-0001-8580-534X</orcidid><orcidid>https://orcid.org/0000-0001-8243-4731</orcidid></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 2162-237X
ispartof IEEE transaction on neural networks and learning systems, 2024-03, Vol.PP, p.1-14
issn 2162-237X
2162-2388
language eng
recordid cdi_pubmed_primary_38478448
source IEEE Electronic Library (IEL)
subjects Advanced machining system (AMS)
Decision making
Energy efficiency
heterogeneous policy search
imitation learning (IL)
Machining
Mathematical models
Metaheuristics
Optimization
pareto-improvement
Task analysis
title Generative Upper-Level Policy Imitation Learning With Pareto-Improvement for Energy-Efficient Advanced Machining Systems
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-15T04%3A49%3A07IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Generative%20Upper-Level%20Policy%20Imitation%20Learning%20With%20Pareto-Improvement%20for%20Energy-Efficient%20Advanced%20Machining%20Systems&rft.jtitle=IEEE%20transaction%20on%20neural%20networks%20and%20learning%20systems&rft.au=Xiao,%20Qinge&rft.date=2024-03-18&rft.volume=PP&rft.spage=1&rft.epage=14&rft.pages=1-14&rft.issn=2162-237X&rft.eissn=2162-2388&rft.coden=ITNNAL&rft_id=info:doi/10.1109/TNNLS.2024.3372641&rft_dat=%3Cproquest_RIE%3E2957163580%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2957163580&rft_id=info:pmid/38478448&rft_ieee_id=10472292&rfr_iscdi=true