Gait Recognition With Multi-Level Skeleton-Guided Refinement

Existing methods combining skeleton and silhouette representations demonstrate explicit effectiveness for gait recognition. However, current related methods simply combine the video-level representations of model-based skeleton data and gait silhouettes for retrieval. Therefore, diverse skeleton inf...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on multimedia 2024, Vol.26, p.4515-4526
Hauptverfasser: Wang, Runsheng, Shi, Yuxuan, Ling, Hefei, Li, Zongyi, Zhao, Chengxin, Wei, Bohao, Li, He, Li, Ping
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Existing methods combining skeleton and silhouette representations demonstrate explicit effectiveness for gait recognition. However, current related methods simply combine the video-level representations of model-based skeleton data and gait silhouettes for retrieval. Therefore, diverse skeleton information is not fully exploited in existing related works: Firstly, the position and movement of bones are not clear from individual silhouettes. This indicates that the frame-level interaction between features of skeletons and silhouettes is critical, which is ignored by previous methods. Secondly, diverse part-level skeleton-guided gait features are not fully captured in existing related approaches. To solve the above issues, we present a novel framework with multi-level skeleton-guided refinement, including frame-level, part-level, and video-level skeleton-guided refinement, for comprehensive skeleton-aided gait representation learning. First, two modules are proposed for frame-level skeleton-guided refinement. Specifically, Visual Skeleton Enhanced Backbone (VSEB) is proposed to visually highlight the global and part-level skeleton regions for the feature of each silhouette frame. Moreover, Cross-Visual-Model Frame-level Interaction (CVMFI) is proposed to further transfer the model-based skeleton information to features of the visual modalities. Secondly, part-level visual and model-based skeleton features are utilized to refine the final gait representation. Concretely, in VSEB, Part Skeleton Enhance Network (PSEN) is proposed to visually enhance the position and movement of part-level skeletons. In addition, Semantic Part Pooling (SPP) is proposed for capturing the model-based skeleton features of different semantic parts. Finally, as the video-level skeleton-guided refinement, multi-modal video-level features are combined to boost the final recognition performance. Extensive experimental results on prevailing datasets demonstrate that our approach outperforms most existing methods, including the skeleton-aided multi-modal methods. With the multi-level refinement guided by the skeleton modalities, the framework is expected to provide a deeper understanding of skeleton-aided gait recognition.
ISSN:1520-9210
1941-0077
DOI:10.1109/TMM.2023.3323887