MetaFuse: A Pre-trained Fusion Model for Human Pose Estimation
Cross view feature fusion is the key to address the occlusion problem in human pose estimation. The current fusion methods need to train a separate model for every pair of cameras making them difficult to scale. In this work, we introduce MetaFuse, a pre-trained fusion model learned from a large num...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Cross view feature fusion is the key to address the occlusion problem in
human pose estimation. The current fusion methods need to train a separate
model for every pair of cameras making them difficult to scale. In this work,
we introduce MetaFuse, a pre-trained fusion model learned from a large number
of cameras in the Panoptic dataset. The model can be efficiently adapted or
finetuned for a new pair of cameras using a small number of labeled images. The
strong adaptation power of MetaFuse is due in large part to the proposed
factorization of the original fusion model into two parts (1) a generic fusion
model shared by all cameras, and (2) lightweight camera-dependent
transformations. Furthermore, the generic model is learned from many cameras by
a meta-learning style algorithm to maximize its adaptation capability to
various camera poses. We observe in experiments that MetaFuse finetuned on the
public datasets outperforms the state-of-the-arts by a large margin which
validates its value in practice. |
---|---|
DOI: | 10.48550/arxiv.2003.13239 |