Bayesian Inferred Self-Attentive Aggregation for Multi-Shot Person Re-Identification

Person re-identification is a challenging retrieval task that aims to match pedestrians from multiple non-overlapping cameras. In this paper, we introduce a deep multi-instance learning framework to aggregate instance-level images to boost retrieval performance. Considerable annotation inconsistency...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on circuits and systems for video technology 2020-10, Vol.30 (10), p.3446-3458
Hauptverfasser:	Liu, Xiaokai, Bi, Sheng, Fang, Shaojun, Bouridane, Ahmed
Format:	Artikel
Sprache:	eng
Schlagworte:	Ablation Agglomeration Annotations Bayes methods Bayesian analysis Cameras collective aggregation Convolutional neural networks Datasets Feature extraction Machine learning multi-instance learning Neural networks Outliers (statistics) Pedestrians Retrieval Robustness self-attentive Semantics spatial alignment Statistical inference
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Person re-identification is a challenging retrieval task that aims to match pedestrians from multiple non-overlapping cameras. In this paper, we introduce a deep multi-instance learning framework to aggregate instance-level images to boost retrieval performance. Considerable annotation inconsistency inevitably happens in many current person re-identification datasets due to unconcerned of annotations or dramatic varieties in surveillance scenarios, thereby leading to model drifting. To alleviate this issue, we formulate the person re-identification problem in a weakly supervised setting, and propose a self-inspired attention model based on Bayesian inference, to adaptively evaluate regional features with their global dependencies across instances, which we refer to as Bayesian Inferred Self-Attentive Aggregation (BISAA). The evaluation mechanism is parameterized by neural networks to provide an insight into the contribution of each instance and semantic human part to set-level labels. Furthermore, to facilitate aggregation across a set of instances, we propose a new collective aggregation function to make the model more robust to outliers, by adjusting the activation threshold, to allow some non-informative instances to be ignored while paying more attention to the discriminative ones. Extensive experiments with ablation analysis show the effectiveness of our method and the proposed method outperforms many related state-of-the-art techniques on four benchmark datasets: PRID2011, iLIDS-VID, Market-1501 and MSMT17.
ISSN:	1051-8215 1558-2205
DOI:	10.1109/TCSVT.2019.2957539