Hamba: Single-view 3D Hand Reconstruction with Graph-guided Bi-Scanning Mamba
3D Hand reconstruction from a single RGB image is challenging due to the articulated motion, self-occlusion, and interaction with objects. Existing SOTA methods employ attention-based transformers to learn the 3D hand pose and shape, yet they do not fully achieve robust and accurate performance, pri...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | 3D Hand reconstruction from a single RGB image is challenging due to the
articulated motion, self-occlusion, and interaction with objects. Existing SOTA
methods employ attention-based transformers to learn the 3D hand pose and
shape, yet they do not fully achieve robust and accurate performance, primarily
due to inefficiently modeling spatial relations between joints. To address this
problem, we propose a novel graph-guided Mamba framework, named Hamba, which
bridges graph learning and state space modeling. Our core idea is to
reformulate Mamba's scanning into graph-guided bidirectional scanning for 3D
reconstruction using a few effective tokens. This enables us to efficiently
learn the spatial relationships between joints for improving reconstruction
performance. Specifically, we design a Graph-guided State Space (GSS) block
that learns the graph-structured relations and spatial sequences of joints and
uses 88.5% fewer tokens than attention-based methods. Additionally, we
integrate the state space features and the global features using a fusion
module. By utilizing the GSS block and the fusion module, Hamba effectively
leverages the graph-guided state space features and jointly considers global
and local features to improve performance. Experiments on several benchmarks
and in-the-wild tests demonstrate that Hamba significantly outperforms existing
SOTAs, achieving the PA-MPVPE of 5.3mm and F@15mm of 0.992 on FreiHAND. At the
time of this paper's acceptance, Hamba holds the top position, Rank 1 in two
Competition Leaderboards on 3D hand reconstruction. Project Website:
https://humansensinglab.github.io/Hamba/ |
---|---|
DOI: | 10.48550/arxiv.2407.09646 |