3D Deformable Convolution Temporal Reasoning network for action recognition

Modeling and reasoning of the interactions between multiple entities (actors and objects) are beneficial for the action recognition task. In this paper, we propose a 3D Deformable Convolution Temporal Reasoning (DCTR) network to model and reason about the latent relationship dependencies between dif...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of visual communication and image representation 2023-05, Vol.93, p.103804, Article 103804
Hauptverfasser: Ou, Yangjun, Chen, Zhenzhong
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Modeling and reasoning of the interactions between multiple entities (actors and objects) are beneficial for the action recognition task. In this paper, we propose a 3D Deformable Convolution Temporal Reasoning (DCTR) network to model and reason about the latent relationship dependencies between different entities in videos. The proposed DCTR network consists of a spatial modeling module and a temporal reasoning module. The spatial modeling module uses 3D deformable convolution to capture relationship dependencies between different entities in the same frame, while the temporal reasoning module uses Conv-LSTM to reason about the changes of multiple entity relationship dependencies in the temporal dimension. Experiments on the Moments-in-Time dataset, UCF101 dataset and HMDB51 dataset demonstrate that the proposed method outperforms several state-of-the-art methods.
ISSN:1047-3203
1095-9076
DOI:10.1016/j.jvcir.2023.103804