LGCANet: lightweight hand pose estimation network based on HRNet

Hand pose estimation is a fundamental task in computer vision with applications in virtual reality, gesture recognition, autonomous driving, and virtual surgery. Keypoint detection often relies on deep learning methods and high-resolution feature map representations to achieve accurate detection. Th...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	The Journal of supercomputing 2024-09, Vol.80 (13), p.19351-19373
Hauptverfasser:	Pan, Xiaoying, Li, Shoukun, Wang, Hao, Wang, Beibei, Wang, Haoyi
Format:	Artikel
Sprache:	eng
Schlagworte:	Compilers Computer Science Computer vision Datasets Feature extraction Feature maps Gesture recognition High resolution Interpreters Lightweight Linear transformations Machine learning Parameters Pose estimation Processor Architectures Programming Languages Representations Virtual reality Weight reduction
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Hand pose estimation is a fundamental task in computer vision with applications in virtual reality, gesture recognition, autonomous driving, and virtual surgery. Keypoint detection often relies on deep learning methods and high-resolution feature map representations to achieve accurate detection. The HRNet framework serves as the basis, but it presents challenges in terms of extensive parameter count and demanding computational complexity due to high-resolution representations. To mitigate these challenges, we propose a lightweight keypoint detection network called LGCANet (Lightweight Ghost-Coordinate Attention Network). This network primarily consists of a lightweight feature extraction head for initial feature extraction and multiple lightweight foundational network modules called GCAblocks. GCAblocks introduce linear transformations to generate redundant feature maps while concurrently considering inter-channel relationships and long-range positional information using a coordinate attention mechanism. Validation on the RHD dataset and the COCO-WholeBody-Hand dataset shows that LGCANet reduces the number of parameters by 65.9% and GFLOPs by 72.6% while preserving the accuracy and improves the detection speed.
ISSN:	0920-8542 1573-0484
DOI:	10.1007/s11227-024-06226-2