Bayesian Inferred Self-Attentive Aggregation for Multi-Shot Person Re-Identification

Person re-identification is a challenging retrieval task that aims to match pedestrians from multiple non-overlapping cameras. In this paper, we introduce a deep multi-instance learning framework to aggregate instance-level images to boost retrieval performance. Considerable annotation inconsistency...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on circuits and systems for video technology 2020-10, Vol.30 (10), p.3446-3458
Hauptverfasser: Liu, Xiaokai, Bi, Sheng, Fang, Shaojun, Bouridane, Ahmed
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Person re-identification is a challenging retrieval task that aims to match pedestrians from multiple non-overlapping cameras. In this paper, we introduce a deep multi-instance learning framework to aggregate instance-level images to boost retrieval performance. Considerable annotation inconsistency inevitably happens in many current person re-identification datasets due to unconcerned of annotations or dramatic varieties in surveillance scenarios, thereby leading to model drifting. To alleviate this issue, we formulate the person re-identification problem in a weakly supervised setting, and propose a self-inspired attention model based on Bayesian inference, to adaptively evaluate regional features with their global dependencies across instances, which we refer to as Bayesian Inferred Self-Attentive Aggregation (BISAA). The evaluation mechanism is parameterized by neural networks to provide an insight into the contribution of each instance and semantic human part to set-level labels. Furthermore, to facilitate aggregation across a set of instances, we propose a new collective aggregation function to make the model more robust to outliers, by adjusting the activation threshold, to allow some non-informative instances to be ignored while paying more attention to the discriminative ones. Extensive experiments with ablation analysis show the effectiveness of our method and the proposed method outperforms many related state-of-the-art techniques on four benchmark datasets: PRID2011, iLIDS-VID, Market-1501 and MSMT17.
ISSN:1051-8215
1558-2205
DOI:10.1109/TCSVT.2019.2957539