Deep salient-Gaussian Fisher vector encoding of the spatio-temporal trajectory structures for person re-identification

In this paper, we propose a deep spatio-temporal appearance (DSTA) descriptor for person re-identification (re-ID). The proposed descriptor is based on the deep Fisher vector (FV) encoding of the trajectory spatio-temporal structures. These have the advantage of robustly handling the misalignment in...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Multimedia tools and applications 2019, Vol.78 (2), p.1583-1611
Hauptverfasser:	Ksibi, Salma, Mejdoub, Mahmoud, Amar, Chokri Ben
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Background noise Coding Computer Communication Networks Computer Science Data Structures and Information Theory Deep learning Gaussian process Misalignment Multimedia Information Systems Neural networks Special Purpose and Application-Based Systems Teaching methods Trajectories
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In this paper, we propose a deep spatio-temporal appearance (DSTA) descriptor for person re-identification (re-ID). The proposed descriptor is based on the deep Fisher vector (FV) encoding of the trajectory spatio-temporal structures. These have the advantage of robustly handling the misalignment in the pedestrian tracklets. The deep encoding exploits the richness of the spatio-temporal structural information around the trajectories. This is achieved by hierarchically encoding the trajectory structures leveraging a larger tracklet neighborhood scale when moving from one layer to the next one. In order to eliminate the noisy background located around the pedestrian and model the uniqueness of its identity, the deep FV encoder is further enriched towards the deep Salient-Gaussian weighted FV (deepSGFV) encoder by integrating the pedestrian Gaussian and saliency templates in the encoding process, respectively. The proposed descriptor produces competitive accuracy with respect to state-of-the art methods and especially the deep CNN ones without necessitating either pre-training or data augmentation on four challenging pedestrian video datasets: PRID2011, i-LIDS-VID, Mars and LPW. The further combination of DSTA with deep CNN boosts the current state-of-the-art methods and demonstrates their complementarity.
ISSN:	1380-7501 1573-7721
DOI:	10.1007/s11042-018-6200-5