Action-conditioned contrastive learning for 3D human pose and shape estimation in videos

The aim of this research is to estimate 3D human pose and shape in videos, which is a challenging task due to the complex nature of the human body and the wide range of possible pose and shape variations. This problem also poses difficulty in finding a satisfactory solution due to the trade-off betw...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Computer vision and image understanding 2024-12, Vol.249, p.104149, Article 104149
Hauptverfasser: Song, Inpyo, Ryu, Moonwook, Lee, Jangwon
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The aim of this research is to estimate 3D human pose and shape in videos, which is a challenging task due to the complex nature of the human body and the wide range of possible pose and shape variations. This problem also poses difficulty in finding a satisfactory solution due to the trade-off between the accuracy and temporal consistency of the estimated 3D pose and shape. Thus previous researches have prioritized one objective over the other. In contrast, we propose a novel approach called the action-conditioned mesh recovery (ACMR) model, which improves accuracy without compromising temporal consistency by leveraging human action information. Our ACMR model outperforms existing methods that prioritize temporal consistency in terms of accuracy, while also achieving comparable temporal consistency with other state-of-the-art methods. Significantly, the action-conditioned learning process occurs only during training, requiring no additional resources at inference time, thereby enhancing performance without increasing computational demands. •Leverages human action information for 3D human mesh recovery.•Simple yet effective approach which can adopted to various models.•Improves accuracy without compromising temporal consistency.•Enhances mesh recovery without extra computational costs for inference.
ISSN:1077-3142
DOI:10.1016/j.cviu.2024.104149