Local and global aligned spatiotemporal attention network for video-based person re-identification

Matching video clips of people across non-overlapping surveillance cameras (video-based person re-identification) is of significant importance in many real-world applications. In this paper, we address the video-based person re-identification by developing a Local and Global Aligned Spatiotemporal A...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Multimedia tools and applications 2020-12, Vol.79 (45-46), p.34489-34512
Hauptverfasser: Cheng, Li, Jing, Xiao-Yuan, Zhu, Xiaoke, Hu, Chang-Hui, Gao, Guangwei, Wu, Songsong
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Matching video clips of people across non-overlapping surveillance cameras (video-based person re-identification) is of significant importance in many real-world applications. In this paper, we address the video-based person re-identification by developing a Local and Global Aligned Spatiotemporal Attention (LGASA) network. Our LGASA network consists of five cascaded modules, including 3D convolutional layers, residual block, spatial transformer network (STN), multi-stream recurrent network and multiple-attention module. Specifically, the 3D convolutional layers are used to capture local short-term fast-varying motion information encoded in multiple adjacent original frames. The residual block is used to extract mid-level feature maps. STN is applied to align the mid-level feature maps. The multi-stream recurrent network is designed to exploit the useful local and global long-term temporal dependency from the aligned mid-level feature maps. The multiple-attention module is designed to aggregate feature vectors of the same body part (or global) from different frames within each video into a single vector according to their importance. Experimental results on three video pedestrian datasets verify the effectiveness of the proposed local and global aligned spatiotemporal attention network.
ISSN:1380-7501
1573-7721
DOI:10.1007/s11042-020-08765-1