A Novel Hierarchical Soft Actor-Critic Algorithm for Multi-Logistics Robots Task Allocation

In intelligent unmanned warehouse goods-to-man systems, the allocation of tasks has an important influence on the efficiency because of the dynamic performance of AGV robots and orders. The paper presents a hierarchical Soft Actor-Critic algorithm to solve the dynamic scheduling problem of orders pi...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE access 2021, Vol.9, p.42568-42582
Hauptverfasser:	Tang, Hengliang, Wang, Anqi, Xue, Fei, Yang, Jiaxin, Cao, Yang
Format:	Artikel
Sprache:	eng
Schlagworte:	actor-critic Algorithms Computer Science Computer Science, Information Systems deep reinforcement learning Dynamic scheduling Engineering Engineering, Electrical & Electronic Heuristic algorithms hierarchical reinforcement learning Logistics Machine learning Multi-logistics robot Reinforcement learning Resource management Robots Science & Technology task allocation Task analysis Technology Telecommunications
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In intelligent unmanned warehouse goods-to-man systems, the allocation of tasks has an important influence on the efficiency because of the dynamic performance of AGV robots and orders. The paper presents a hierarchical Soft Actor-Critic algorithm to solve the dynamic scheduling problem of orders picking. The method proposed is based on the classic Soft Actor-Critic and hierarchical reinforcement learning algorithm. In this paper, the model is trained at different time scales by introducing sub-goals, with the top-level learning a policy and the bottom level learning a policy to achieve the sub-goals. The actor of the controller aims to maximize expected intrinsic reward while also maximizing entropy. That is, to succeed at the sub-goals while moving as randomly as possible. Finally, experimental results for simulation experiments in different scenes show that the method can make multi-logistics AGV robots work together and improves the reward in sparse environments about 2.61 times compared to the SAC algorithm.
ISSN:	2169-3536 2169-3536
DOI:	10.1109/ACCESS.2021.3062457