Utilizing motion segmentation for optimizing the temporal adjacency matrix in 3D human pose estimation

In monocular 3D human pose estimation, modeling the temporal relation of human joints is crucial for prediction accuracy. Currently, most methods utilize transformer to model the temporal relation among joints. However, existing transformer-based methods have limitations. The temporal adjacency matr...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Neurocomputing (Amsterdam) 2024-10, Vol.600, p.128153, Article 128153
Hauptverfasser: Wang, Yingfeng, Li, Muyu, Yan, Hong
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In monocular 3D human pose estimation, modeling the temporal relation of human joints is crucial for prediction accuracy. Currently, most methods utilize transformer to model the temporal relation among joints. However, existing transformer-based methods have limitations. The temporal adjacency matrix utilized within the self-attention of the temporal transformer inaccurately models the temporal relationships between frames, particularly in cases where distinct motions exhibit significant correlation despite having different physical interpretations and large temporal spans. To address this issue, we construct an artificial temporal adjacency matrix based on input data and introduce a temporal adjacency matrix hybrid module to blend this matrix with the model’s inherent temporal adjacency matrix, resulting in a novel composite temporal adjacency matrix. Through extensive experiments on Human3.6M and MPI-INF-3DHP datasets using state-of-the-art methods as benchmarks, our proposed method demonstrates a maximum improvement of up to 5.6% compared to the original approach.
ISSN:0925-2312
DOI:10.1016/j.neucom.2024.128153