Flow guided mutual attention for person re-identification

Person Re-Identification (ReID) is a challenging problem in many video analytics and surveillance applications, where a person's identity must be associated across a distributed non-overlapping network of cameras. Video-based person ReID has recently gained much interest given the potential for...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Image and vision computing 2021-09, Vol.113, p.104246, Article 104246
Hauptverfasser: Kiran, Madhu, Bhuiyan, Amran, Nguyen-Meidine, Le Thanh, Blais-Morin, Louis-Antoine, Ayed, Ismail Ben, Granger, Eric
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Person Re-Identification (ReID) is a challenging problem in many video analytics and surveillance applications, where a person's identity must be associated across a distributed non-overlapping network of cameras. Video-based person ReID has recently gained much interest given the potential for capturing discriminant spatio-temporal information from video clips that is unavailable for image-based ReID. Despite recent advances, deep learning (DL) models for video ReID often fail to leverage this information to improve the robustness of feature representations. In this paper, the motion pattern of a person is explored as an additional cue for ReID. In particular, a flow-guided Mutual Attention network is proposed for fusion of bounding box and optical flow sequences over tracklets using any 2D-CNN backbone, allowing to encode temporal information along with spatial appearance information. Our Mutual Attention network relies on the joint spatial attention between image and optical flow feature maps to activate a common set of salient features. In addition to flow-guided attention, we introduce a method to aggregate features from longer input streams for better video sequence-level representation. Our extensive experiments on three challenging video ReID datasets indicate that using the proposed approach allows to improve recognition accuracy considerably with respect to conventional gated-attention networks, and state-of-the-art methods for video-based person ReID. [Display omitted] •A Mutual Attention Network is proposed to leverage both Optical FLow and Video Stream input for Video Person ReID.•Longer sequences help in capturing robust features.•Longer sequences need careful attention based weighing to dis-regard outliers.•The Mutual Attention model can be used with different backbones.
ISSN:0262-8856
1872-8138
DOI:10.1016/j.imavis.2021.104246