CLUE-AI: A Convolutional Three-stream Anomaly Identification Framework for Robot Manipulation

Robot safety has been a prominent research topic in recent years since robots are more involved in daily tasks. It is crucial to devise the required safety mechanisms to enable service robots to be aware of and react to anomalies (i.e., unexpected deviations from intended outcomes) that arise during...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Altan, Dogan, Sariel, Sanem
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Robotics
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Robot safety has been a prominent research topic in recent years since robots are more involved in daily tasks. It is crucial to devise the required safety mechanisms to enable service robots to be aware of and react to anomalies (i.e., unexpected deviations from intended outcomes) that arise during the execution of these tasks. Detection and identification of these anomalies is an essential step towards fulfilling these requirements. Although several architectures are proposed for anomaly detection, identification is not yet thoroughly investigated. This task is challenging since indicators may appear long before anomalies are detected. In this paper, we propose a ConvoLUtional threE-stream Anomaly Identification (CLUE-AI) framework to address this problem. The framework fuses visual, auditory and proprioceptive data streams to identify everyday object manipulation anomalies. A stream of 2D images gathered through an RGB-D camera placed on the head of the robot is processed within a self-attention enabled visual stage to capture visual anomaly indicators. The auditory modality provided by the microphone placed on the robot's lower torso is processed within a designed convolutional neural network (CNN) in the auditory stage. Last, the force applied by the gripper and the gripper state are processed within a CNN to obtain proprioceptive features. These outputs are then combined with a late fusion scheme. Our novel three-stream framework design is analyzed on everyday object manipulation tasks with a Baxter humanoid robot in a semi-structured setting. The results indicate that the framework achieves an f-score of 94% outperforming the other baselines in classifying anomalies that arise during runtime.
DOI:	10.48550/arxiv.2203.08746