SMS-Net: Bridging the Gap Between High Accuracy and Low Computational Cost in Pose Estimation

Human pose estimation identifies and classifies key joints of the human body in images or videos. Existing pose estimation methods can precisely capture human movements in real time but require significant computational time and resources, which restricts their usage in specific conditions. Thus, we...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Applied sciences 2024-11, Vol.14 (22), p.10143
Hauptverfasser:	Noh, Won-Jun, Moon, Ki-Ryum, Lee, Byoung-Dai
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Analysis computer vision Costs Datasets Deep learning Efficiency feature extraction human pose estimation internet of things lightweight models Machine vision
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Human pose estimation identifies and classifies key joints of the human body in images or videos. Existing pose estimation methods can precisely capture human movements in real time but require significant computational time and resources, which restricts their usage in specific conditions. Thus, we propose a lightweight pose estimation model—SMS-Net—based on the sequentially stacked structure of the hourglass network. The proposed model uses various lightweight techniques to enable high-speed pose estimation while requiring minimal storage space and computation. Specifically, a shuffle-gated block was introduced to reduce the computational load and number of parameters during the feature extraction process of the encoder composing each hourglass network. A multi-dilation block was used in the decoder to secure the receptive fields of various scales without increasing the computational load. The performance of the proposed model was assessed using the MPII and Common Objects in Context (COCO) datasets used for pose estimation and certain performance metrics and compared with state-of-the-art lightweight pose estimation models. Furthermore, an ablation study was performed to assess the impact of each module on network performance and efficiency. The results demonstrate that the proposed model achieved an improved balance between computational efficiency and performance compared to existing models in human pose estimation. Overall, the study findings can provide a basis for applications in computer vision technology.
ISSN:	2076-3417 2076-3417
DOI:	10.3390/app142210143