DBMHT: A double-branch multi-hypothesis transformer for 3D human pose estimation in video

The estimation of 3D human poses from monocular videos presents a significant challenge. The existing methods face the problems of deep ambiguity and self-occlusion. To overcome these problems, we propose a Double-Branch Multi-Hypothesis Transformer (DBMHT). In detail, we utilize a Double-Branch arc...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Computer vision and image understanding 2024-12, Vol.249, p.104147, Article 104147
Hauptverfasser: Xiang, Xuezhi, Li, Xiaoheng, Bao, Weijie, Qiao, Yulong, El Saddik, Abdulmotaleb
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The estimation of 3D human poses from monocular videos presents a significant challenge. The existing methods face the problems of deep ambiguity and self-occlusion. To overcome these problems, we propose a Double-Branch Multi-Hypothesis Transformer (DBMHT). In detail, we utilize a Double-Branch architecture to capture temporal and spatial information and generate multiple hypotheses. To merge these hypotheses, we adopt a lightweight module to integrate spatial and temporal representations. The DBMHT can not only capture spatial information from each joint in the human body and temporal information from each frame in the video but also merge multiple hypotheses that have different spatio-temporal information. Comprehensive evaluation on two challenging datasets (i.e. Human3.6M and MPI-INF-3DHP) demonstrates the superior performance of DBMHT, marking it as a robust and efficient approach for accurate 3D HPE in dynamic scenarios. The results show that our model surpasses the state-of-the-art approach by 1.9% MPJPE with ground truth 2D keypoints as input. •We propose a Dual-Branch Transformer structure to capture both spatial and temporal information.•We propose a lightweight Multi-Hypothesis Merging module to improve the performance of 3D pose estimation.•We propose DBMHT which achieves competitive performance on Human3.6M and MPI-INF-3DHP dataset.
ISSN:1077-3142
DOI:10.1016/j.cviu.2024.104147