Spatiotemporal Representation Learning for Video Anomaly Detection

Video-based anomalous human behavior detection is widely studied in many fields such as security, medical care, education, and energy. However, there are still some open problems in anomalous behavior detection, such as the large and complicated model is difficult to train, the accuracy of anomalous...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE access 2020, Vol.8, p.25531-25542
Hauptverfasser: Li, Zhaoyan, Li, Yaoshun, Gao, Zhisheng
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Video-based anomalous human behavior detection is widely studied in many fields such as security, medical care, education, and energy. However, there are still some open problems in anomalous behavior detection, such as the large and complicated model is difficult to train, the accuracy of anomalous behavior detection is not high enough and the speed is not fast enough. A spatiotemporal representation learning model is proposed in this paper. Firstly, the spatial-temporal features of the video are extracted by the constructed multi-scale 3D convolutional neural network. Then the scene background is modeled by the high-dimensional mixed Gaussian model and used for anomaly detection. Finally, the accurate position of anomalous behavior in the video data is achieved by calculating the position of the last output feature, that is, the position of the receptive field. The proposed model does not require specific training. Moreover, the proposed method has the advantages of high versatility, fast calculation speed and high detection accuracy. We validated the proposed algorithm on two representative surveillance scene datasets, the Subway and the UCSDSped2. Results show that proposed algorithm has achieved the detection rate of 18 FPS under the condition of common computing resources, and meet the real-time requirements. Moreover, compared the similar methods, the proposed method has achieved the competitive results in both frame-level accuracy and pixel-level accuracy.
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2020.2970497