Temporal segment dropout for human action video recognition

Temporal information is important for human action video recognition. With the widely used spatio-temporal neural networks, researchers have found that the learned high-level features preserve overfitted spatial information and limited temporal information, leading to inferior performance. This is b...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Pattern recognition 2024-02, Vol.146, p.109985, Article 109985
Hauptverfasser:	Zhang, Yu, Chen, Zhengjie, Xu, Tianyu, Zhao, Junjie, Mi, Siya, Geng, Xin, Zhang, Min-Ling
Format:	Artikel
Sprache:	eng
Schlagworte:	Action recognition Temporal regularization Temporal segment dropout
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Temporal information is important for human action video recognition. With the widely used spatio-temporal neural networks, researchers have found that the learned high-level features preserve overfitted spatial information and limited temporal information, leading to inferior performance. This is because existing networks lack efficient regularization for the temporal structure. To learn more robust temporal features, we propose a temporal regularization method named Temporal Segment Dropout (TSD). TSD drops the most salient spatial features in order to enhance the temporal features in a clip of temporal segments. Without learning from complex examples, TSD can be easily deployed in existing networks. In the experiment, TSD is extensively evaluated on benchmark action recognition datasets, which brings consistent improvements over the baselines, especially for the action-centric classes. •We propose to drop the salient spatial feature map to enhance the temporal features in CNNs for video recognition.•We propose a new temporal regularization method named Temporal Segment Drop (TSD), which can be efficiently deployed in existing CNNs.•TSD is evaluated on benchmark video recognition datasets, which consistently improves the baseline methods.
ISSN:	0031-3203 1873-5142
DOI:	10.1016/j.patcog.2023.109985