SPGformer: Serial-Parallel Hybrid GCN-Transformer With Graph-Oriented Encoder for 2-D-to-3-D Human Pose Estimation

Accurate acquisition of 3-D human joint poses holds significant implications for tasks such as human action recognition. Monocular single-frame 2-D -to-3-D pose estimation focuses on establishing the correspondence between 2-D human pose in a single image and their 3-D spatial pose, delegating the p...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on instrumentation and measurement 2024, Vol.73, p.1-15
Hauptverfasser:	Fang, Qin, Xu, Zihan, Hu, Mengxian, Zeng, Qinyang, Liu, Chengju, Chen, Qijun
Format:	Artikel
Sprache:	eng
Schlagworte:	2-D-to-3-D pose estimation absolute pose Algorithms Artificial neural networks Cameras Coders Constraint modelling Datasets graph convolutional network (GCN) Human activity recognition Joints Modules Pose estimation serial–parallel Three-dimensional displays transformer encoder Transformers Transmission line matrix methods Vectors
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Accurate acquisition of 3-D human joint poses holds significant implications for tasks such as human action recognition. Monocular single-frame 2-D -to-3-D pose estimation focuses on establishing the correspondence between 2-D human pose in a single image and their 3-D spatial pose, delegating the preliminary task of 2-D pose estimation to models better suited for processing pixel information. The intricacy of 2-D -to-3-D pose estimation resides in modeling the spatial constraints among joints. To better learn the structure between joints, this article proposes the SPGformer algorithm, constructed with stacked serial-parallel GCN-encoder (SPGEncoder) modules. This module forms a dual-branch framework composed of transformer encoders (Encoders) and graph-oriented encoders (GraEncoders). We recover concealed depth values from the 2-D coordinates of joints, inputting them into the joint branch of the SPGEncoder. In parallel, we take the connection features of joints in the image as vector branch input. The proposed GraEncoder module integrates a learnable graph convolutional network (GCN) prior to the Encoder, enabling the learning of a broader spectrum of joint connections within the confines of skeletal linkage. Furthermore, this article presents a methodology for calculating the 3-D absolute pose of the root node, filling a research gap for applications requiring precise human position. This nonlearnable, plug-and-play method has been validated on the Human3.6M dataset. The SPGformer algorithm outperforms state-of-the-art methods on both the Human3.6M and MPI-INF-3DHP datasets.
ISSN:	0018-9456 1557-9662
DOI:	10.1109/TIM.2024.3381701