Motion saliency based multi-stream multiplier ResNets for action recognition

In this paper, we propose a Motion Saliency based multi-stream Multiplier ResNets (MSM-ResNets) for action recognition. The proposed MSM-ResNets model consists of three interactive streams: the appearance stream, motion stream and motion saliency stream. Similar to conventional two-stream CNNs model...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Image and vision computing 2021-03, Vol.107, p.104108, Article 104108
Hauptverfasser:	Zong, Ming, Wang, Ruili, Chen, Xiubo, Chen, Zhe, Gong, Yuanhao
Format:	Artikel
Sprache:	eng
Schlagworte:	Action recognition Motion saliency Multiplicative connections Spatiotemporal interactive information
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In this paper, we propose a Motion Saliency based multi-stream Multiplier ResNets (MSM-ResNets) for action recognition. The proposed MSM-ResNets model consists of three interactive streams: the appearance stream, motion stream and motion saliency stream. Similar to conventional two-stream CNNs models, the appearance stream and motion stream are responsible for capturing the appearance information and motion information, respectively, while the motion saliency stream is responsible for capturing the salient motion information. In particular, to effectively utilize the spatiotemporal interactive information between different streams, the proposed MSM-ResNets model establishes interactive connections between different streams instead of fusing three streams at the final output layer. Two kinds of different multiplicative connections are injected, the first one is to inject multiplicative connections from the motion stream to the appearance stream, while the second one is to inject multiplicative connections from the motion saliency stream to the motion stream. Experimental results verify the effectiveness of the proposed MSM-ResNets on two standard action recognition datasets: UCF101 and HMDB51. •A motion saliency stream is proposed for capturing the salient motion information.•Injecting multiplicative connections between different streams.•Sharing the motion saliency cue from the motion saliency stream to the motion stream.
ISSN:	0262-8856 1872-8138
DOI:	10.1016/j.imavis.2021.104108