Deep-Reinforcement-Learning-Based Object Transportation Using Task Space Decomposition

This paper presents a novel object transportation method using deep reinforcement learning (DRL) and the task space decomposition (TSD) method. Most previous studies on DRL-based object transportation worked well only in the specific environment where a robot learned how to transport an object. Anot...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Sensors (Basel, Switzerland) Switzerland), 2023-05, Vol.23 (10), p.4807
1. Verfasser:	Eoh, Gyuho
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Curricula Decomposition Deep learning deep reinforcement learning Learning Methods object transportation Planning Robots School environment Task space task space decomposition Transportation industry
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	This paper presents a novel object transportation method using deep reinforcement learning (DRL) and the task space decomposition (TSD) method. Most previous studies on DRL-based object transportation worked well only in the specific environment where a robot learned how to transport an object. Another drawback was that DRL only converged in relatively small environments. This is because the existing DRL-based object transportation methods are highly dependent on learning conditions and training environments; they cannot be applied to large and complicated environments. Therefore, we propose a new DRL-based object transportation that decomposes a difficult task space to be transported into simple multiple sub-task spaces using the TSD method. First, a robot sufficiently learned how to transport an object in a standard learning environment (SLE) that has small and symmetric structures. Then, a whole-task space was decomposed into several sub-task spaces by considering the size of the SLE, and we created sub-goals for each sub-task space. Finally, the robot transported an object by sequentially occupying the sub-goals. The proposed method can be extended to a large and complicated new environment as well as the training environment without additional learning or re-learning. Simulations in different environments are presented to verify the proposed method, such as a long corridor, polygons, and a maze.
ISSN:	1424-8220 1424-8220
DOI:	10.3390/s23104807