Dynamic Erasing Network Based on Multi-Scale Temporal Features for Weakly Supervised Video Anomaly Detection
The goal of weakly supervised video anomaly detection is to learn a detection model using only video-level labeled data. However, prior studies typically divide videos into fixed-length segments without considering the complexity or duration of anomalies. Moreover, these studies usually just detect...
Gespeichert in:
Hauptverfasser: | , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The goal of weakly supervised video anomaly detection is to learn a detection
model using only video-level labeled data. However, prior studies typically
divide videos into fixed-length segments without considering the complexity or
duration of anomalies. Moreover, these studies usually just detect the most
abnormal segments, potentially overlooking the completeness of anomalies. To
address these limitations, we propose a Dynamic Erasing Network (DE-Net) for
weakly supervised video anomaly detection, which learns multi-scale temporal
features. Specifically, to handle duration variations of abnormal events, we
first propose a multi-scale temporal modeling module, capable of extracting
features from segments of varying lengths and capturing both local and global
visual information across different temporal scales. Then, we design a dynamic
erasing strategy, which dynamically assesses the completeness of the detected
anomalies and erases prominent abnormal segments in order to encourage the
model to discover gentle abnormal segments in a video. The proposed method
obtains favorable performance compared to several state-of-the-art approaches
on three datasets: XD-Violence, TAD, and UCF-Crime. Code will be made available
at https://github.com/ArielZc/DE-Net. |
---|---|
DOI: | 10.48550/arxiv.2312.01764 |