Computationally constrained audio-based violence detection through transfer learning and data augmentation techniques

Audio-based violence detection is a critical research area for enhancing public safety and security. This paper delves into comparing machine learning models, specifically Convolutional Neural Networks and Shallow Networks in the context of audio violence detection. We evaluate these models under va...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Applied acoustics 2023-10, Vol.213, p.109638, Article 109638
Hauptverfasser:	Zhu-Zhou, Fangfang, Tejera-Berengué, Diana, Gil-Pita, Roberto, Utrilla-Manso, Manuel, Rosa-Zurera, Manuel
Format:	Artikel
Sprache:	eng
Schlagworte:	Audio processing Computational cost Violence detection
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Audio-based violence detection is a critical research area for enhancing public safety and security. This paper delves into comparing machine learning models, specifically Convolutional Neural Networks and Shallow Networks in the context of audio violence detection. We evaluate these models under varying training set configurations and data augmentation techniques, analyzing their impact on model performance and robustness under varying real-world conditions. Specifically, we address the issue of domain shifts, exploring how models perform under different types of noise and reverberation. Our results highlight scenarios where Shallow Networks, despite their lower computational costs, exhibit performance nearly on par with that of high-cost CNNs. Introducing tailored data augmentation techniques significantly enhances the models' performance and stability against domain shifts, providing a promising direction for improving system robustness. Our research underscores the value of careful model selection for real-world audio-based violence detection applications, recognizing the importance of an optimal trade-off between computational cost and performance, especially in resource-constrained scenarios. This research provides valuable insights for researchers and practitioners in developing more efficient, robust and accurate audio-based violence detection systems. •Explored deep vs shallow models for audio violence, assessing computational cost and environmental impact.•New model selection perspective for audio violence, highlighting efficiency in resource-constrained areas.•First to address domain shifts in audio violence, enhancing algorithms' robustness under real-world conditions.•Innovative data augmentation advances audio violence detection, enhancing robustness to varied acoustics.
ISSN:	0003-682X 1872-910X
DOI:	10.1016/j.apacoust.2023.109638