Video Anomaly Detection via Motion Completion Diffusion for Intelligent Surveillance System

Detecting abnormal behaviors in video from surveillance cameras is a crucial and challenging task in different public and industrial manufacturing scenarios. Unlike conventional techniques using raw video data from camera sensor, pose-based approach utilizes a low-dimensional, highly structured skel...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE sensors journal 2024-11, Vol.24 (21), p.35928-35938
Hauptverfasser: Xue, Zhenhua, Hu, Ronghuai, Huang, Chao, Wei, Zhenlin
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Detecting abnormal behaviors in video from surveillance cameras is a crucial and challenging task in different public and industrial manufacturing scenarios. Unlike conventional techniques using raw video data from camera sensor, pose-based approach utilizes a low-dimensional, highly structured skeleton feature, ensuring immunity to background disturbances and improving detection efficiency. Nevertheless, existing pose-based methods mainly utilize an encoder-decoder architecture to conduct video anomaly detection (VAD), which indeed remain unsatisfactory due to insufficient coverage of different motion pattern variants. To tackle these challenges, we propose a novel motion completion diffusion (MCDiffusion) model for anomaly detection using motion sequences extracted from camera sensor data. Our MCDiffusion is characterized by high-quality sample generation and robust pattern coverage. Specifically, our model conditions on observed motion to provide more accurate and controllable motion completion results. We train a diffusion model based on motion sequence masking, where the model gradually makes generation for masked motion from random noise to learn normal patterns. Anomaly is determined based on the error between the masked motion and its generation. Additionally, we construct human pose as a hierarchical spatio-temporal graph to capture dynamic interactions among individuals and the pose within each individual. Our MCDiffusion achieves state-of-the-art (SOAT) performance on four widely used VAD datasets, thus setting a new benchmark for online anomaly detection of video cameras.
ISSN:1530-437X
1558-1748
DOI:10.1109/JSEN.2024.3453437