Pedestrian 3D Shape Understanding for Person Re-Identification via Multi-View Learning

Recent development in computing power has resulted in performance improvements on holistic (none-occluded) person Re-Identification (ReID) tasks. Nevertheless, the precision of the recent research will diminish when a pedestrian is obstructed by obstacles. Within the realm of 2D space, the loss of i...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on circuits and systems for video technology 2024-07, Vol.34 (7), p.5589-5602
Hauptverfasser: Yu, Zaiyang, Li, Lusi, Xie, Jinlong, Wang, Changshuo, Li, Weijun, Ning, Xin
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Recent development in computing power has resulted in performance improvements on holistic (none-occluded) person Re-Identification (ReID) tasks. Nevertheless, the precision of the recent research will diminish when a pedestrian is obstructed by obstacles. Within the realm of 2D space, the loss of information from obstructed objects continues to pose significant challenges in the context of person ReID. Person is a 3D non-grid object, and thus semantic representation learning in only 2D space limits the understanding of occluded person. In the present work, we propose a network based on 3D multi-view learning, allowing it to acquire geometric and shape details of an occluded pedestrian from 3D space. Simultaneously, it capitalizes on advancements in 2D-based networks to extract semantic representations from 3D multi-views. Specifically, the surface random selection strategy is proposed to convert images of 2D RGB into 3D multi-views. Using this strategy, we build four extensive 3D multi-view data collections for person ReID. After that, Pedestrian 3D Shape Understanding for Person Re-Identification via Multi-View Learning (MV-3DSReID), is proposed for identifying the person by learning person geometry and structure representation from the groups of multi-view images. In comparison to alternative data formats (e.g., 2D RGB, 3D point cloud), multi-view images complement each other's detailed features of the 3D object by adjusting rendering viewpoints, thus facilitating a more comprehensive understanding of the person for both holistic and occluded ReID situations. Experiments on occluded and holistic ReID tasks demonstrate performance levels comparable to state-of-the-art methods, validating the effectiveness of our proposed approach in tackling challenges related to occlusion. The code is available at https://github.com/hangjiaqi1/MV-TransReID .
ISSN:1051-8215
1558-2205
DOI:10.1109/TCSVT.2024.3358850