Towards Open-Set NLP-Based Multi-Level Planning for Robotic Tasks

This paper outlines a conceptual design for a multi-level natural language-based planning system and describes a demonstrator. The main goal of the demonstrator is to serve as a proof-of-concept by accomplishing end-to-end execution in a real-world environment, and showing a novel way of interfacing...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Applied sciences 2024-11, Vol.14 (22), p.10717
Hauptverfasser:	Racinskis, Peteris, Vismanis, Oskars, Zinars, Toms Eduards, Arents, Janis, Greitans, Modris
Format:	Artikel
Sprache:	eng
Schlagworte:	Artificial intelligence Language Large language models Natural language processing open-set semantics perception Planning Robotics Robots Semantics Skills SLAM vision–language model (VLM)
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	This paper outlines a conceptual design for a multi-level natural language-based planning system and describes a demonstrator. The main goal of the demonstrator is to serve as a proof-of-concept by accomplishing end-to-end execution in a real-world environment, and showing a novel way of interfacing an LLM-based planner with open-set semantic maps. The target use-case is executing sequences of tabletop pick-and-place operations using an industrial robot arm and RGB-D camera. The demonstrator processes unstructured user prompts, produces high-level action plans, queries a map for object positions and grasp poses using open-set semantics, then uses the resulting outputs to parametrize and execute a sequence of action primitives. In this paper, the overall system structure, high-level planning using language models, low-level planning through action and motion primitives, as well as the implementation of two different environment modeling schemes—2.5 or fully 3-dimensional—are described in detail. The impacts of quantizing image embeddings on object recall are assessed and high-level planner performance is evaluated using a small reference scene data set. We observe that, for the simple constrained test command data set, the high-level planner is able to achieve a total success rate of 96.40%, while the semantic maps exhibit maximum recall rates of 94.69% and 92.29% for the 2.5d and 3d versions, respectively.
ISSN:	2076-3417 2076-3417
DOI:	10.3390/app142210717