Violence Detection Enhancement by Involving Convolutional Block Attention Modules into Various Deep Learning Architectures: Comprehensive Case Study for UBI-Fights Dataset

The violence detection in surveillance videos is a complicated task, due to the requirements of extracting the spatio-temporal features in different videos environment, and various videos prospective cases. Hereby, in this paper, different architectures are proposed to perform this task in high perf...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE access 2023-01, Vol.11, p.1-1
Hauptverfasser:	Abbass, Mahmoud Abdelkader Bashery, Kang, Hyun-Soo
Format:	Artikel
Sprache:	eng
Schlagworte:	Area Under the Curve Metric Case studies Categorical Focal Loss Computer architecture ConvLSTM2D Convolutional Block Attention Module Datasets Equal Error Rate metric Feature extraction Measurement Modules Performance measurement Spatio-Temporal Features Surveillance Three-dimensional displays Training UBI-Fights Data Video Videos Violence Violence Detection
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The violence detection in surveillance videos is a complicated task, due to the requirements of extracting the spatio-temporal features in different videos environment, and various videos prospective cases. Hereby, in this paper, different architectures are proposed to perform this task in high performance, by using the UBI-Fights dataset as a comprehensive case study. The proposed architectures are based on involving the Convolutional Block Attention Modules (CBAM) with other simple layers (e.g., ConvLSTM2D or Conv2D&LSTM). In addition, using the Categorical Focal Loss (CFL) as loss function during architectures training to increase the focus on the most important features. To evaluate the proposed architectures, the performance metrics like are Area Under the Curve (AUC), and Equal Error Rate (EER); are mainly used, to declare the architecture ability of identifying the violence correctly, with low interaction value between classes. The performance results declare the ability of the proposed architectures, to achieve higher results that the state of art techniques. For example, the Conv2D&LSTM based architecture, get AUC value of 0.9493, and EER value of 0.0507; that outperform the most of the other proposed ones, and the state of art performance.
ISSN:	2169-3536 2169-3536
DOI:	10.1109/ACCESS.2023.3267409