Fine-Tuning Multimodal Transformer Models for Generating Actions in Virtual and Real Environments

In this work, we propose and investigate an original approach to using a pre-trained multimodal transformer of a specialized architecture for controlling a robotic agent in an object manipulation task based on language instruction, which we refer to as RozumFormer. Our model is based on a bimodal (t...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE access 2023-01, Vol.11, p.1-1
Hauptverfasser:	Staroverov, Aleksei, Gorodetsky, Andrey S., Krishtopik, Andrei S., Izmesteva, Uliana A., Yudin, Dmitry A., Kovalev, Alexey K., Panov, Aleksandr I.
Format:	Artikel
Sprache:	eng
Schlagworte:	Action generation Adaptation models bimodal transformer models Cameras Datasets Image manipulation Image processing intelligent agent Language instruction Language modeling Manipulators Multimodality Reagents Reinforcement Robot control Robot kinematics Robot vision systems robotic manipulator arm control Robots Simulation Task analysis Transformers Visual tasks
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Schreiben Sie den ersten Kommentar!