A strong benchmark for yoga action recognition based on lightweight pose estimation model: A strong benchmark for yoga action recognition based on lightweight

Yoga action recognition is crucial for enabling precise motion analysis and providing effective training guidance, which in turn facilitates the optimization of physical health and skill enhancement. However, current methods struggle to maintain high accuracy and real-time performance when dealing w...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Multimedia systems 2025, Vol.31 (1)
Hauptverfasser: Zhou, Liangtai, Zhang, Weiwei, Zhang, Banghui, Li, Xiaobin, Zhu, Jianqing
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Yoga action recognition is crucial for enabling precise motion analysis and providing effective training guidance, which in turn facilitates the optimization of physical health and skill enhancement. However, current methods struggle to maintain high accuracy and real-time performance when dealing with the complex poses and occlusions. Additionally, these methods neglect the dynamic characteristics and temporal sequence information inherent in yoga actions. Therefore, this paper proposes a two-stage action recognition method tailored for yoga scenarios. The method initially employs pose estimation technology based on knowledge distillation to optimize the accuracy and efficiency of lightweight models in detecting complex poses and occlusions. Subsequently, a lightweight 3D convolutional neural network (3D-CNN) is utilized for action recognition, achieving seamless integration of the two stages through heat maps, thereby enhancing recognition accuracy and precisely capturing spatiotemporal features in video sequences. Experimental results indicate that on the COCO dataset, the DistillPose-m model achieves a 2.5% improvement in Average Precision (AP) compared to RTMPose-m. In the yoga action recognition task, our model exhibites approximately a 2% improvement over traditional Graph Convolutional Network (GCN) methods on both the Deepyoga and 3Dyoga90 datasets. This study enhances the performance and accuracy of pose estimation in yoga scenarios, addressing the challenges of bodily occlusions and complex postures. By fully leveraging the spatiotemporal information inherent in yoga movements, it improves the accuracy of yoga action recognition. This research provides critical insights and support for motion training and analysis systems in other dynamic activities, such as martial arts and dance.
ISSN:0942-4962
1432-1882
DOI:10.1007/s00530-024-01646-9