Temporal segment dropout for human action video recognition
Temporal information is important for human action video recognition. With the widely used spatio-temporal neural networks, researchers have found that the learned high-level features preserve overfitted spatial information and limited temporal information, leading to inferior performance. This is b...
Gespeichert in:
Veröffentlicht in: | Pattern recognition 2024-02, Vol.146, p.109985, Article 109985 |
---|---|
Hauptverfasser: | , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Temporal information is important for human action video recognition. With the widely used spatio-temporal neural networks, researchers have found that the learned high-level features preserve overfitted spatial information and limited temporal information, leading to inferior performance. This is because existing networks lack efficient regularization for the temporal structure. To learn more robust temporal features, we propose a temporal regularization method named Temporal Segment Dropout (TSD). TSD drops the most salient spatial features in order to enhance the temporal features in a clip of temporal segments. Without learning from complex examples, TSD can be easily deployed in existing networks. In the experiment, TSD is extensively evaluated on benchmark action recognition datasets, which brings consistent improvements over the baselines, especially for the action-centric classes.
•We propose to drop the salient spatial feature map to enhance the temporal features in CNNs for video recognition.•We propose a new temporal regularization method named Temporal Segment Drop (TSD), which can be efficiently deployed in existing CNNs.•TSD is evaluated on benchmark video recognition datasets, which consistently improves the baseline methods. |
---|---|
ISSN: | 0031-3203 1873-5142 |
DOI: | 10.1016/j.patcog.2023.109985 |