PoseBERT: A Generic Transformer Module for Temporal 3D Human Modeling
Training state-of-the-art models for human pose estimation in videos requires datasets with annotations that are really hard and expensive to obtain. Although transformers have been recently utilized for body pose sequence modeling, related methods rely on pseudo-ground truth to augment the currentl...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Training state-of-the-art models for human pose estimation in videos requires
datasets with annotations that are really hard and expensive to obtain.
Although transformers have been recently utilized for body pose sequence
modeling, related methods rely on pseudo-ground truth to augment the currently
limited training data available for learning such models. In this paper, we
introduce PoseBERT, a transformer module that is fully trained on 3D Motion
Capture (MoCap) data via masked modeling. It is simple, generic and versatile,
as it can be plugged on top of any image-based model to transform it in a
video-based model leveraging temporal information. We showcase variants of
PoseBERT with different inputs varying from 3D skeleton keypoints to rotations
of a 3D parametric model for either the full body (SMPL) or just the hands
(MANO). Since PoseBERT training is task agnostic, the model can be applied to
several tasks such as pose refinement, future pose prediction or motion
completion without finetuning. Our experimental results validate that adding
PoseBERT on top of various state-of-the-art pose estimation methods
consistently improves their performances, while its low computational cost
allows us to use it in a real-time demo for smoothly animating a robotic hand
via a webcam. Test code and models are available at
https://github.com/naver/posebert. |
---|---|
DOI: | 10.48550/arxiv.2208.10211 |