Human Action Recognition by Discriminative Feature Pooling and Video Segment Attention Model

We Introduce a simple yet effective network that embeds a novel Discriminative Feature Pooling (DFP) mechanism and a novel Video Segment Attention Model (VSAM), for video-based human action recognition from both trimmed and untrimmed videos. Our DFP module introduces an attentional pooling mechanism...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on multimedia 2022, Vol.24, p.689-701
Hauptverfasser: Moniruzzaman, Md, Yin, Zhaozheng, He, Zhihai, Qin, Ruwen, Leu, Ming C
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:We Introduce a simple yet effective network that embeds a novel Discriminative Feature Pooling (DFP) mechanism and a novel Video Segment Attention Model (VSAM), for video-based human action recognition from both trimmed and untrimmed videos. Our DFP module introduces an attentional pooling mechanism for 3D Convolutional Neural Networks that attentionally pools 3D convolutional feature maps to emphasize the most critical spatial, temporal, and channel-wise features related to the actions within a video segment, while our VSAM ensembles these most critical features from all video segments and learns (1) class-specific attention weights to classify the video segments into the corresponding action categories, and (2) class-agnostic attention weights to rank the video segments based on their relevance to the action class. Our action recognition network can be trained from both trimmed videos in a fully-supervised way and untrimmed videos in a weakly-supervised way. For untrimmed videos with weak labels, our network learns attention weights without the requirement of precise temporal annotations of action occurrences in videos. Evaluated on the untrimmed video datasets of THUMOS14 and ActivityNet1.2, and trimmed video datasets of HMDB51, UCF101, and HOLLYWOOD2, our network achieves promising performance, compared to the latest state-of-the-art method. The implementation code is available at https://github.com/MoniruzzamanMd/DFP-VSAM-Networks .
ISSN:1520-9210
1941-0077
DOI:10.1109/TMM.2021.3058050