Generative Upper-Level Policy Imitation Learning With Pareto-Improvement for Energy-Efficient Advanced Machining Systems
The potential intelligence behind advanced machining systems (AMSs) offers positive contributions toward process improvement. Imitation learning (IL) offers an appealing approach to accessing this intelligence by observing demonstrations from skilled technologists. However, existing IL algorithms th...
Gespeichert in:
Veröffentlicht in: | IEEE transaction on neural networks and learning systems 2024-03, Vol.PP, p.1-14 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The potential intelligence behind advanced machining systems (AMSs) offers positive contributions toward process improvement. Imitation learning (IL) offers an appealing approach to accessing this intelligence by observing demonstrations from skilled technologists. However, existing IL algorithms that implement single policy strategies have yet to consider realistic scenarios for complex AMS tasks, where the available demonstrations may have come from various experts. Moreover, most IL assumes that the expert's policy is optimal, preventing the learning from fulfilling the previously ignored green missions. This article introduces a novel three-phase policy search algorithm based on IL, enabling the learning of heterogeneous expert policies while balancing energy preferences. The first phase equips the agent with machining basics through upper-level policy learning, generating an imitation policy distribution with various decision-making principles. The second phase enhances energy conservation capabilities by employing Pareto-improvement learning and fine-tuning the agent's policies to a Pareto-policy manifold. The third phase produces outcomes and amplifies the efficacy of human feedback by utilizing ensemble policies. The experimental results indicate that the proposed method outperforms meta-heuristics, exhibiting superior solution quality and faster computation times compared to four diverse baseline methods, each with diverse samples. |
---|---|
ISSN: | 2162-237X 2162-2388 |
DOI: | 10.1109/TNNLS.2024.3372641 |