Hand-Object Pose Estimation and Reconstruction Based on Signed Distance Field and Multiscale Feature Interaction

The study of reconstruction of hands and objects from color monocular images has garnered considerable attention in recent years. In existing methods, parametric models are constructed at single scale, and the interaction between hands and objects has not fully be explored. As a result, the multisca...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on industrial informatics 2024-09, Vol.20 (9), p.11242-11251
Hauptverfasser: Zhang, Xinkang, Dai, Xiaokun, Zhang, Ziqun, Di, Xinhan, Chen, Xinrong
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The study of reconstruction of hands and objects from color monocular images has garnered considerable attention in recent years. In existing methods, parametric models are constructed at single scale, and the interaction between hands and objects has not fully be explored. As a result, the multiscale information in 2D images cannot be fully exploited. At the same time, the lack of feature fusion and insufficient utilization of labels also have a great impact on the reconstruction accuracy. To address the limitations, a new framework is proposed, which comprises three key modules. Firstly, a multiscale feature extractor, which generates a multiscale representation of feature, is used to capture the interaction between hand and object more effectively. Secondly, a bridge based on attention has been used to establish the connection between hand and object representations, which facilitates the integration of them. Lastly, a module based on token merge is introduced into the framework, which provides the segmentation representation of object. The experimental results on two datasets, named Obman and DexYCB, demonstrated that the proposed method had good performance and achieved a shape error about 0.121 \text{cm}^{2} on Obman and 0.40 \text{cm}^{2} on DexYCB, outperforming the state-of-the-art methods. This study will probably provide the human-computer interaction methods with broader application prospects.
ISSN:1551-3203
1941-0050
DOI:10.1109/TII.2024.3383542