Detection and localization of anomalous objects in video sequences using vision transformers and U-Net model
The detection and localization of anomalous objects in video sequences remain a challenging task in video analysis. Recent years have witnessed a surge in deep learning approaches, especially with recurrent neural networks (RNNs). However, RNNs have limitations that vision transformers (ViTs) can ad...
Gespeichert in:
Veröffentlicht in: | Signal, image and video processing image and video processing, 2024, Vol.18 (8-9), p.6379-6390 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The detection and localization of anomalous objects in video sequences remain a challenging task in video analysis. Recent years have witnessed a surge in deep learning approaches, especially with recurrent neural networks (RNNs). However, RNNs have limitations that vision transformers (ViTs) can address. We propose a novel solution that leverages ViTs, which have recently achieved remarkable success in various computer vision tasks. Our approach involves a two-step process. First, we utilize a pre-trained ViT model to generate an intermediate representation containing an attention map, highlighting areas critical for anomaly detection. In the second step, this attention map is concatenated with the original video frame, creating a richer representation that guides the U-Net model towards anomaly-prone regions. This enriched data is then fed into a U-Net model for precise localization of the anomalous objects. The model achieved a mean Intersection over Union (IoU) of 0.70, indicating a strong overlap between the predicted bounding boxes and the ground truth annotations. In the field of anomaly detection, a higher IoU score signifies better performance. Moreover, the pixel accuracy of 0.99 demonstrates a high level of precision in classifying individual pixels. Concerning localization accuracy, we conducted a comparison of our method with other approaches. The results obtained show that our method outperforms most of the previous methods and achieves a very competitive performance in terms of localization accuracy. |
---|---|
ISSN: | 1863-1703 1863-1711 |
DOI: | 10.1007/s11760-024-03323-w |