XYZ-channel encoding and augmentation of human joint skeleton coordinates for end-to-end action recognition

Recognizing human actions from skeletal data is a major challenge, as it does not always deliver optimal performance due to the limited ability to discern the spatio-temporal patterns inherent in skeletal data. This study aims to enhance the precision of action recognition by conceptualizing each ac...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Signal, image and video processing image and video processing, 2024-11, Vol.18 (11), p.7857-7871
Hauptverfasser:	Elaoud, Amani, Ghazouani, Haythem, Barhoumi, Walid
Format:	Artikel
Sprache:	eng
Schlagworte:	Activity recognition Coding Computer Imaging Computer Science Data augmentation Datasets Human performance Image enhancement Image Processing and Computer Vision Multimedia Information Systems Original Paper Pattern recognition Pattern Recognition and Graphics Signal,Image and Speech Processing Spatiotemporal data Vision
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	7871
container_issue	11
container_start_page	7857
container_title	Signal, image and video processing
container_volume	18
creator	Elaoud, Amani Ghazouani, Haythem Barhoumi, Walid
description	Recognizing human actions from skeletal data is a major challenge, as it does not always deliver optimal performance due to the limited ability to discern the spatio-temporal patterns inherent in skeletal data. This study aims to enhance the precision of action recognition by conceptualizing each action as a 3D matrix, accurately capturing spatio-temporal dynamics within images. These matrices offer a comprehensive encapsulation of the dynamic evolution of skeletal joint coordinates ( x , y , and z ) over time, affording a holistic comprehension of human actions. Using these 3D matrices as three-channel images enables us to capture the rich spatio-temporal information they contain. The suggested XYZ-channel action encoding facilitates the application of data augmentation techniques, thereby enhancing model generalization and robustness. Furthermore, we present a customized CNN architecture designed to efficiently extract spatiotemporal features from actions coded on the XYZ channel and classify them accurately. Extensive experiments on diverse datasets; including MSR Action3D, UTD-MAD and CZU-MHAD; demonstrate the effectiveness of the proposed CNN architecture. We achieve a test set accuracy of 96% on the MSR Action3D dataset, 97.9% on the UTD-MAD dataset and 98% on the CZU-MHAD datatset, underlining the method’s ability to accurately recognize human actions from skeletal data in challenging scenarios.
doi_str_mv	10.1007/s11760-024-03434-4
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_3104476141</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3104476141</sourcerecordid><originalsourceid>FETCH-LOGICAL-c200t-62f312d68628a7240b634fb2301db0928f46d40986ce67fe2409e07c5b45a5b83</originalsourceid><addsrcrecordid>eNp9kEtLAzEUhQdRsNT-AVcB19GbRzPpUoovKLhRUDchk0mm00dSk8zCf2_aiu68m3u4nO9cOFV1SeCaANQ3iZBaAAbKMTDOOOYn1YhIwTCpCTn91cDOq0lKKyjDaC2FHFXrt_cPbJbae7tB1pvQ9r5D2rdID93W-qxzHzwKDi2HrfZoFXqfUVrbjc3lbkKIhdDZJuRCLAktzgHbPW8OZLQmdL7f64vqzOlNspOfPa5e7-9e5o948fzwNL9dYEMBMhbUMUJbIQWVuqYcGsG4aygD0jYwo9Jx0XKYSWGsqJ0tjpmF2kwbPtXTRrJxdXXM3cXwOdiU1SoM0ZeXihHgvBaEk-KiR5eJIaVondrFfqvjlyKg9r2qY6-q9KoOvSpeIHaEUjH7zsa_6H-ob1JXeso</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3104476141</pqid></control><display><type>article</type><title>XYZ-channel encoding and augmentation of human joint skeleton coordinates for end-to-end action recognition</title><source>SpringerLink Journals - AutoHoldings</source><creator>Elaoud, Amani ; Ghazouani, Haythem ; Barhoumi, Walid</creator><creatorcontrib>Elaoud, Amani ; Ghazouani, Haythem ; Barhoumi, Walid</creatorcontrib><description>Recognizing human actions from skeletal data is a major challenge, as it does not always deliver optimal performance due to the limited ability to discern the spatio-temporal patterns inherent in skeletal data. This study aims to enhance the precision of action recognition by conceptualizing each action as a 3D matrix, accurately capturing spatio-temporal dynamics within images. These matrices offer a comprehensive encapsulation of the dynamic evolution of skeletal joint coordinates ( x , y , and z ) over time, affording a holistic comprehension of human actions. Using these 3D matrices as three-channel images enables us to capture the rich spatio-temporal information they contain. The suggested XYZ-channel action encoding facilitates the application of data augmentation techniques, thereby enhancing model generalization and robustness. Furthermore, we present a customized CNN architecture designed to efficiently extract spatiotemporal features from actions coded on the XYZ channel and classify them accurately. Extensive experiments on diverse datasets; including MSR Action3D, UTD-MAD and CZU-MHAD; demonstrate the effectiveness of the proposed CNN architecture. We achieve a test set accuracy of 96% on the MSR Action3D dataset, 97.9% on the UTD-MAD dataset and 98% on the CZU-MHAD datatset, underlining the method’s ability to accurately recognize human actions from skeletal data in challenging scenarios.</description><identifier>ISSN: 1863-1703</identifier><identifier>EISSN: 1863-1711</identifier><identifier>DOI: 10.1007/s11760-024-03434-4</identifier><language>eng</language><publisher>London: Springer London</publisher><subject>Activity recognition ; Coding ; Computer Imaging ; Computer Science ; Data augmentation ; Datasets ; Human performance ; Image enhancement ; Image Processing and Computer Vision ; Multimedia Information Systems ; Original Paper ; Pattern recognition ; Pattern Recognition and Graphics ; Signal,Image and Speech Processing ; Spatiotemporal data ; Vision</subject><ispartof>Signal, image and video processing, 2024-11, Vol.18 (11), p.7857-7871</ispartof><rights>The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2024. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c200t-62f312d68628a7240b634fb2301db0928f46d40986ce67fe2409e07c5b45a5b83</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s11760-024-03434-4$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s11760-024-03434-4$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,780,784,27924,27925,41488,42557,51319</link.rule.ids></links><search><creatorcontrib>Elaoud, Amani</creatorcontrib><creatorcontrib>Ghazouani, Haythem</creatorcontrib><creatorcontrib>Barhoumi, Walid</creatorcontrib><title>XYZ-channel encoding and augmentation of human joint skeleton coordinates for end-to-end action recognition</title><title>Signal, image and video processing</title><addtitle>SIViP</addtitle><description>Recognizing human actions from skeletal data is a major challenge, as it does not always deliver optimal performance due to the limited ability to discern the spatio-temporal patterns inherent in skeletal data. This study aims to enhance the precision of action recognition by conceptualizing each action as a 3D matrix, accurately capturing spatio-temporal dynamics within images. These matrices offer a comprehensive encapsulation of the dynamic evolution of skeletal joint coordinates ( x , y , and z ) over time, affording a holistic comprehension of human actions. Using these 3D matrices as three-channel images enables us to capture the rich spatio-temporal information they contain. The suggested XYZ-channel action encoding facilitates the application of data augmentation techniques, thereby enhancing model generalization and robustness. Furthermore, we present a customized CNN architecture designed to efficiently extract spatiotemporal features from actions coded on the XYZ channel and classify them accurately. Extensive experiments on diverse datasets; including MSR Action3D, UTD-MAD and CZU-MHAD; demonstrate the effectiveness of the proposed CNN architecture. We achieve a test set accuracy of 96% on the MSR Action3D dataset, 97.9% on the UTD-MAD dataset and 98% on the CZU-MHAD datatset, underlining the method’s ability to accurately recognize human actions from skeletal data in challenging scenarios.</description><subject>Activity recognition</subject><subject>Coding</subject><subject>Computer Imaging</subject><subject>Computer Science</subject><subject>Data augmentation</subject><subject>Datasets</subject><subject>Human performance</subject><subject>Image enhancement</subject><subject>Image Processing and Computer Vision</subject><subject>Multimedia Information Systems</subject><subject>Original Paper</subject><subject>Pattern recognition</subject><subject>Pattern Recognition and Graphics</subject><subject>Signal,Image and Speech Processing</subject><subject>Spatiotemporal data</subject><subject>Vision</subject><issn>1863-1703</issn><issn>1863-1711</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNp9kEtLAzEUhQdRsNT-AVcB19GbRzPpUoovKLhRUDchk0mm00dSk8zCf2_aiu68m3u4nO9cOFV1SeCaANQ3iZBaAAbKMTDOOOYn1YhIwTCpCTn91cDOq0lKKyjDaC2FHFXrt_cPbJbae7tB1pvQ9r5D2rdID93W-qxzHzwKDi2HrfZoFXqfUVrbjc3lbkKIhdDZJuRCLAktzgHbPW8OZLQmdL7f64vqzOlNspOfPa5e7-9e5o948fzwNL9dYEMBMhbUMUJbIQWVuqYcGsG4aygD0jYwo9Jx0XKYSWGsqJ0tjpmF2kwbPtXTRrJxdXXM3cXwOdiU1SoM0ZeXihHgvBaEk-KiR5eJIaVondrFfqvjlyKg9r2qY6-q9KoOvSpeIHaEUjH7zsa_6H-ob1JXeso</recordid><startdate>20241101</startdate><enddate>20241101</enddate><creator>Elaoud, Amani</creator><creator>Ghazouani, Haythem</creator><creator>Barhoumi, Walid</creator><general>Springer London</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20241101</creationdate><title>XYZ-channel encoding and augmentation of human joint skeleton coordinates for end-to-end action recognition</title><author>Elaoud, Amani ; Ghazouani, Haythem ; Barhoumi, Walid</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c200t-62f312d68628a7240b634fb2301db0928f46d40986ce67fe2409e07c5b45a5b83</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Activity recognition</topic><topic>Coding</topic><topic>Computer Imaging</topic><topic>Computer Science</topic><topic>Data augmentation</topic><topic>Datasets</topic><topic>Human performance</topic><topic>Image enhancement</topic><topic>Image Processing and Computer Vision</topic><topic>Multimedia Information Systems</topic><topic>Original Paper</topic><topic>Pattern recognition</topic><topic>Pattern Recognition and Graphics</topic><topic>Signal,Image and Speech Processing</topic><topic>Spatiotemporal data</topic><topic>Vision</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Elaoud, Amani</creatorcontrib><creatorcontrib>Ghazouani, Haythem</creatorcontrib><creatorcontrib>Barhoumi, Walid</creatorcontrib><collection>CrossRef</collection><jtitle>Signal, image and video processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Elaoud, Amani</au><au>Ghazouani, Haythem</au><au>Barhoumi, Walid</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>XYZ-channel encoding and augmentation of human joint skeleton coordinates for end-to-end action recognition</atitle><jtitle>Signal, image and video processing</jtitle><stitle>SIViP</stitle><date>2024-11-01</date><risdate>2024</risdate><volume>18</volume><issue>11</issue><spage>7857</spage><epage>7871</epage><pages>7857-7871</pages><issn>1863-1703</issn><eissn>1863-1711</eissn><abstract>Recognizing human actions from skeletal data is a major challenge, as it does not always deliver optimal performance due to the limited ability to discern the spatio-temporal patterns inherent in skeletal data. This study aims to enhance the precision of action recognition by conceptualizing each action as a 3D matrix, accurately capturing spatio-temporal dynamics within images. These matrices offer a comprehensive encapsulation of the dynamic evolution of skeletal joint coordinates ( x , y , and z ) over time, affording a holistic comprehension of human actions. Using these 3D matrices as three-channel images enables us to capture the rich spatio-temporal information they contain. The suggested XYZ-channel action encoding facilitates the application of data augmentation techniques, thereby enhancing model generalization and robustness. Furthermore, we present a customized CNN architecture designed to efficiently extract spatiotemporal features from actions coded on the XYZ channel and classify them accurately. Extensive experiments on diverse datasets; including MSR Action3D, UTD-MAD and CZU-MHAD; demonstrate the effectiveness of the proposed CNN architecture. We achieve a test set accuracy of 96% on the MSR Action3D dataset, 97.9% on the UTD-MAD dataset and 98% on the CZU-MHAD datatset, underlining the method’s ability to accurately recognize human actions from skeletal data in challenging scenarios.</abstract><cop>London</cop><pub>Springer London</pub><doi>10.1007/s11760-024-03434-4</doi><tpages>15</tpages></addata></record>
fulltext	fulltext
identifier	ISSN: 1863-1703
ispartof	Signal, image and video processing, 2024-11, Vol.18 (11), p.7857-7871
issn	1863-1703 1863-1711
language	eng
recordid	cdi_proquest_journals_3104476141
source	SpringerLink Journals - AutoHoldings
subjects	Activity recognition Coding Computer Imaging Computer Science Data augmentation Datasets Human performance Image enhancement Image Processing and Computer Vision Multimedia Information Systems Original Paper Pattern recognition Pattern Recognition and Graphics Signal,Image and Speech Processing Spatiotemporal data Vision
title	XYZ-channel encoding and augmentation of human joint skeleton coordinates for end-to-end action recognition
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-24T18%3A17%3A11IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=XYZ-channel%20encoding%20and%20augmentation%20of%20human%20joint%20skeleton%20coordinates%20for%20end-to-end%20action%20recognition&rft.jtitle=Signal,%20image%20and%20video%20processing&rft.au=Elaoud,%20Amani&rft.date=2024-11-01&rft.volume=18&rft.issue=11&rft.spage=7857&rft.epage=7871&rft.pages=7857-7871&rft.issn=1863-1703&rft.eissn=1863-1711&rft_id=info:doi/10.1007/s11760-024-03434-4&rft_dat=%3Cproquest_cross%3E3104476141%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3104476141&rft_id=info:pmid/&rfr_iscdi=true