Learning Multimodal Confidence for Intention Recognition in Human-Robot Interaction

The rapid development of collaborative robotics has provided a new possibility of helping the elderly who has difficulties in daily life, allowing robots to operate according to specific intentions. However, efficient human-robot cooperation requires natural, accurate and reliable intention recognit...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE robotics and automation letters 2024-09, Vol.9 (9), p.7819-7826
Hauptverfasser:	Zhao, Xiyuan, Li, Huijun, Miao, Tianyuan, Zhu, Xianyi, Wei, Zhikai, Tan, Lifen, Song, Aiguo
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Feature extraction Human engineering Human factors and human-in-the-loop Human-robot interaction Machine learning multimodal confidence learning for opinion pool multimodal perception for HRI Reliability Robot learning Robotics Robots Speech recognition Task analysis Uncertainty Vectors
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The rapid development of collaborative robotics has provided a new possibility of helping the elderly who has difficulties in daily life, allowing robots to operate according to specific intentions. However, efficient human-robot cooperation requires natural, accurate and reliable intention recognition in shared environments. The current paramount challenge for this is reducing the uncertainty of multimodal fused intention to be recognized and reasoning adaptively a more reliable result despite current interactive condition. In this letter we propose a novel learning-based multimodal fusion framework Batch Multimodal Confidence Learning for Opinion Pool (BMCLOP). Our approach combines Bayesian multimodal fusion method and batch confidence learning algorithm to improve accuracy, uncertainty reduction and success rate given the interactive condition. In particular, the generic and practical multimodal intention recognition framework can be easily extended further. Our desired assistive scenarios consider three modalities gestures, speech and gaze, all of which produce categorical distributions over all the finite intentions. The proposed method is validated with a six-DoF robot through extensive experiments and exhibits high performance compared to baselines.
ISSN:	2377-3766 2377-3766
DOI:	10.1109/LRA.2024.3432352