Sequential attention mechanism for weakly supervised video anomaly detection

Surveillance cameras are installed across various sectors of a smart city in order to capture ongoing events for monitoring purposes. The analysis of these surveillance videos is an important research topic that involves activity recognition, object detection, anomaly recognition, and other problems...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Expert systems with applications 2023-11, Vol.230, p.120599, Article 120599
Hauptverfasser: Ullah, Waseem, Min Ullah, Fath U, Ahmad Khan, Zulfiqar, Wook Baik, Sung
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Surveillance cameras are installed across various sectors of a smart city in order to capture ongoing events for monitoring purposes. The analysis of these surveillance videos is an important research topic that involves activity recognition, object detection, anomaly recognition, and other problems. However, anomaly recognition is the most common task in a smart city, and has received significant attention with the aim of ensuring public safety and security. Many works have been published in this field, but these schemes have not been able to provide the desired detection outcomes. Mainstream anomaly recognition methods are heavily dependent on strong supervision to achieve satisfactory performance, which is time-consuming and impractical. With a particular focus on this problem, this article presents a deep convolution neural network (CNN)-based novel anomaly recognition model, in which deep features are extracted from surveillance video frames. These features are forwarded to the proposed temporal convolution network (TCN) that includes a multi-head attention module to enable it to recognise anomalies from these videos. The multi-head temporal attention mechanism enables the model to obtain more key temporal information about the complex surveillance environment. Experiments conducted on standard datasets and a comparison with state-of-the-art approaches demonstrate the effectiveness and superiority of the proposed framework, which achieves increases in accuracy of 0.9%, 1.9%, 0.65%, 0.27%, and 1.5% on the UCF-Crime2local, LAD-2000, RWF-2000, RLVS, and Crowd Violence datasets, respectively. These outcomes indicate the suitability of our method for deployment in real-time surveillance schemes.
ISSN:0957-4174
1873-6793
DOI:10.1016/j.eswa.2023.120599