Multimodal Human Action Recognition Framework Using an Improved CNNGRU Classifier

Activity recognition from multiple sensors is a promising research area with various applications for remote human activity tracking in surveillance systems. Human activity recognition (HAR) aims to identify human actions and assign descriptors using diverse data modalities such as skeleton, RGB, de...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE access 2024, Vol.12, p.158388-158406
Hauptverfasser: Batool, Mouazma, Alotaibi, Moneerah, Alotaibi, Sultan Refa, Alhammadi, Dina Abdulaziz, Jamal, Muhammad Asif, Jalal, Ahmad, Lee, Bumshik
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 158406
container_issue
container_start_page 158388
container_title IEEE access
container_volume 12
creator Batool, Mouazma
Alotaibi, Moneerah
Alotaibi, Sultan Refa
Alhammadi, Dina Abdulaziz
Jamal, Muhammad Asif
Jalal, Ahmad
Lee, Bumshik
description Activity recognition from multiple sensors is a promising research area with various applications for remote human activity tracking in surveillance systems. Human activity recognition (HAR) aims to identify human actions and assign descriptors using diverse data modalities such as skeleton, RGB, depth, infrared, inertial, audio, Wi-Fi, and radar. This paper introduces a novel HAR system for multi-sensor surveillance, incorporating RGB, RGB-D, and inertial sensors. The process involves framing and segmenting multi-sensor data, reducing noise and inconsistencies through filtration, and extracting novel features, which are then transformed into a matrix. The novel features include dynamic likelihood random field (DLRF), angle along sagittal plane (ASP), Lagregression (LR), and Gammatone cepstral coefficients (GCC), respectively. Additionally, a genetic algorithm is utilized to merge and refine this matrix by eliminating redundant information. The fused data is finally classified with an improved Convolutional Neural Network - Gated Recurrent Unit (CNNGRU) classifier to recognize specific human actions. Experimental evaluation using the leave-one-subject-out (LOSO) cross-validation on Berkeley-MHAD, HWU-USP, UTD-MHAD, NTU-RGB+D60, and NTU-RGB+D120 benchmark datasets demonstrates that the proposed system outperforms existing state-of-the-art techniques with the accuracy of 97.91%, 97.99%, 97.90%, 96.61%, and 95.94% respectively.
doi_str_mv 10.1109/ACCESS.2024.3481631
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1109_ACCESS_2024_3481631</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10719991</ieee_id><doaj_id>oai_doaj_org_article_d7c235cdc17646caae5b4a44113c5dd4</doaj_id><sourcerecordid>3123123349</sourcerecordid><originalsourceid>FETCH-LOGICAL-c289t-dde3eabaa6c5cf0d889fdb21d5db6a0673a47f4359a4d3cc15f7b8053286664b3</originalsourceid><addsrcrecordid>eNpNUdtqwzAMDWODla5fsD0E9twuji-JH0voDbqOteuzUWynuEvizk439vdLmzIqBBLinCOJEwSPKBohFPGXcZZNNptRHMVkhEmKGEY3QS9GjA8xxez2qr8PBt7vozZaGKdJL3h_PZaNqayCMpwfK6jDsWyMrcO1lnZXm3M_dVDpH-s-w6039S5sUYvq4Oy3VmG2Ws3W2zArwXtTGO0egrsCSq8Hl9oPttPJRzYfLt9mi2y8HMo45c1QKY015ABMUllEKk15ofIYKapyBhFLMJCkIJhyIApLiWiR5GlEcZwyxkiO-8Gi01UW9uLgTAXuV1gw4jywbifANUaWWqhExphKJVHCCJMAmuYECEEIS6oUabWeO632qa-j9o3Y26Or2_MFRvEpMeEtCnco6az3Thf_W1EkTlaIzgpxskJcrGhZTx3LaK2vGAninCP8B8-jhOE</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3123123349</pqid></control><display><type>article</type><title>Multimodal Human Action Recognition Framework Using an Improved CNNGRU Classifier</title><source>IEEE Open Access Journals</source><source>DOAJ Directory of Open Access Journals</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><creator>Batool, Mouazma ; Alotaibi, Moneerah ; Alotaibi, Sultan Refa ; Alhammadi, Dina Abdulaziz ; Jamal, Muhammad Asif ; Jalal, Ahmad ; Lee, Bumshik</creator><creatorcontrib>Batool, Mouazma ; Alotaibi, Moneerah ; Alotaibi, Sultan Refa ; Alhammadi, Dina Abdulaziz ; Jamal, Muhammad Asif ; Jalal, Ahmad ; Lee, Bumshik</creatorcontrib><description>Activity recognition from multiple sensors is a promising research area with various applications for remote human activity tracking in surveillance systems. Human activity recognition (HAR) aims to identify human actions and assign descriptors using diverse data modalities such as skeleton, RGB, depth, infrared, inertial, audio, Wi-Fi, and radar. This paper introduces a novel HAR system for multi-sensor surveillance, incorporating RGB, RGB-D, and inertial sensors. The process involves framing and segmenting multi-sensor data, reducing noise and inconsistencies through filtration, and extracting novel features, which are then transformed into a matrix. The novel features include dynamic likelihood random field (DLRF), angle along sagittal plane (ASP), Lagregression (LR), and Gammatone cepstral coefficients (GCC), respectively. Additionally, a genetic algorithm is utilized to merge and refine this matrix by eliminating redundant information. The fused data is finally classified with an improved Convolutional Neural Network - Gated Recurrent Unit (CNNGRU) classifier to recognize specific human actions. Experimental evaluation using the leave-one-subject-out (LOSO) cross-validation on Berkeley-MHAD, HWU-USP, UTD-MHAD, NTU-RGB+D60, and NTU-RGB+D120 benchmark datasets demonstrates that the proposed system outperforms existing state-of-the-art techniques with the accuracy of 97.91%, 97.99%, 97.90%, 96.61%, and 95.94% respectively.</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2024.3481631</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Accuracy ; Artificial neural networks ; Audio data ; Computational modeling ; Convolutional neural network ; Convolutional neural networks ; Deep learning ; depth camera ; Face recognition ; Feature extraction ; Fields (mathematics) ; Genetic algorithms ; human action recognition ; Human activity recognition ; Inertial sensing devices ; inertial sensors ; Infrared tracking ; multi-sensors ; RGB ; Sensors ; Surveillance ; Surveillance radar ; Surveillance systems ; Wearable sensors</subject><ispartof>IEEE access, 2024, Vol.12, p.158388-158406</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c289t-dde3eabaa6c5cf0d889fdb21d5db6a0673a47f4359a4d3cc15f7b8053286664b3</cites><orcidid>0000-0002-2134-5388 ; 0009-0000-8421-8477 ; 0000-0002-0074-8153 ; 0000-0003-2482-1869</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10719991$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,776,780,860,2096,4010,27610,27900,27901,27902,54908</link.rule.ids></links><search><creatorcontrib>Batool, Mouazma</creatorcontrib><creatorcontrib>Alotaibi, Moneerah</creatorcontrib><creatorcontrib>Alotaibi, Sultan Refa</creatorcontrib><creatorcontrib>Alhammadi, Dina Abdulaziz</creatorcontrib><creatorcontrib>Jamal, Muhammad Asif</creatorcontrib><creatorcontrib>Jalal, Ahmad</creatorcontrib><creatorcontrib>Lee, Bumshik</creatorcontrib><title>Multimodal Human Action Recognition Framework Using an Improved CNNGRU Classifier</title><title>IEEE access</title><addtitle>Access</addtitle><description>Activity recognition from multiple sensors is a promising research area with various applications for remote human activity tracking in surveillance systems. Human activity recognition (HAR) aims to identify human actions and assign descriptors using diverse data modalities such as skeleton, RGB, depth, infrared, inertial, audio, Wi-Fi, and radar. This paper introduces a novel HAR system for multi-sensor surveillance, incorporating RGB, RGB-D, and inertial sensors. The process involves framing and segmenting multi-sensor data, reducing noise and inconsistencies through filtration, and extracting novel features, which are then transformed into a matrix. The novel features include dynamic likelihood random field (DLRF), angle along sagittal plane (ASP), Lagregression (LR), and Gammatone cepstral coefficients (GCC), respectively. Additionally, a genetic algorithm is utilized to merge and refine this matrix by eliminating redundant information. The fused data is finally classified with an improved Convolutional Neural Network - Gated Recurrent Unit (CNNGRU) classifier to recognize specific human actions. Experimental evaluation using the leave-one-subject-out (LOSO) cross-validation on Berkeley-MHAD, HWU-USP, UTD-MHAD, NTU-RGB+D60, and NTU-RGB+D120 benchmark datasets demonstrates that the proposed system outperforms existing state-of-the-art techniques with the accuracy of 97.91%, 97.99%, 97.90%, 96.61%, and 95.94% respectively.</description><subject>Accuracy</subject><subject>Artificial neural networks</subject><subject>Audio data</subject><subject>Computational modeling</subject><subject>Convolutional neural network</subject><subject>Convolutional neural networks</subject><subject>Deep learning</subject><subject>depth camera</subject><subject>Face recognition</subject><subject>Feature extraction</subject><subject>Fields (mathematics)</subject><subject>Genetic algorithms</subject><subject>human action recognition</subject><subject>Human activity recognition</subject><subject>Inertial sensing devices</subject><subject>inertial sensors</subject><subject>Infrared tracking</subject><subject>multi-sensors</subject><subject>RGB</subject><subject>Sensors</subject><subject>Surveillance</subject><subject>Surveillance radar</subject><subject>Surveillance systems</subject><subject>Wearable sensors</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><sourceid>DOA</sourceid><recordid>eNpNUdtqwzAMDWODla5fsD0E9twuji-JH0voDbqOteuzUWynuEvizk439vdLmzIqBBLinCOJEwSPKBohFPGXcZZNNptRHMVkhEmKGEY3QS9GjA8xxez2qr8PBt7vozZaGKdJL3h_PZaNqayCMpwfK6jDsWyMrcO1lnZXm3M_dVDpH-s-w6039S5sUYvq4Oy3VmG2Ws3W2zArwXtTGO0egrsCSq8Hl9oPttPJRzYfLt9mi2y8HMo45c1QKY015ABMUllEKk15ofIYKapyBhFLMJCkIJhyIApLiWiR5GlEcZwyxkiO-8Gi01UW9uLgTAXuV1gw4jywbifANUaWWqhExphKJVHCCJMAmuYECEEIS6oUabWeO632qa-j9o3Y26Or2_MFRvEpMeEtCnco6az3Thf_W1EkTlaIzgpxskJcrGhZTx3LaK2vGAninCP8B8-jhOE</recordid><startdate>2024</startdate><enddate>2024</enddate><creator>Batool, Mouazma</creator><creator>Alotaibi, Moneerah</creator><creator>Alotaibi, Sultan Refa</creator><creator>Alhammadi, Dina Abdulaziz</creator><creator>Jamal, Muhammad Asif</creator><creator>Jalal, Ahmad</creator><creator>Lee, Bumshik</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0002-2134-5388</orcidid><orcidid>https://orcid.org/0009-0000-8421-8477</orcidid><orcidid>https://orcid.org/0000-0002-0074-8153</orcidid><orcidid>https://orcid.org/0000-0003-2482-1869</orcidid></search><sort><creationdate>2024</creationdate><title>Multimodal Human Action Recognition Framework Using an Improved CNNGRU Classifier</title><author>Batool, Mouazma ; Alotaibi, Moneerah ; Alotaibi, Sultan Refa ; Alhammadi, Dina Abdulaziz ; Jamal, Muhammad Asif ; Jalal, Ahmad ; Lee, Bumshik</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c289t-dde3eabaa6c5cf0d889fdb21d5db6a0673a47f4359a4d3cc15f7b8053286664b3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Accuracy</topic><topic>Artificial neural networks</topic><topic>Audio data</topic><topic>Computational modeling</topic><topic>Convolutional neural network</topic><topic>Convolutional neural networks</topic><topic>Deep learning</topic><topic>depth camera</topic><topic>Face recognition</topic><topic>Feature extraction</topic><topic>Fields (mathematics)</topic><topic>Genetic algorithms</topic><topic>human action recognition</topic><topic>Human activity recognition</topic><topic>Inertial sensing devices</topic><topic>inertial sensors</topic><topic>Infrared tracking</topic><topic>multi-sensors</topic><topic>RGB</topic><topic>Sensors</topic><topic>Surveillance</topic><topic>Surveillance radar</topic><topic>Surveillance systems</topic><topic>Wearable sensors</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Batool, Mouazma</creatorcontrib><creatorcontrib>Alotaibi, Moneerah</creatorcontrib><creatorcontrib>Alotaibi, Sultan Refa</creatorcontrib><creatorcontrib>Alhammadi, Dina Abdulaziz</creatorcontrib><creatorcontrib>Jamal, Muhammad Asif</creatorcontrib><creatorcontrib>Jalal, Ahmad</creatorcontrib><creatorcontrib>Lee, Bumshik</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Batool, Mouazma</au><au>Alotaibi, Moneerah</au><au>Alotaibi, Sultan Refa</au><au>Alhammadi, Dina Abdulaziz</au><au>Jamal, Muhammad Asif</au><au>Jalal, Ahmad</au><au>Lee, Bumshik</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Multimodal Human Action Recognition Framework Using an Improved CNNGRU Classifier</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2024</date><risdate>2024</risdate><volume>12</volume><spage>158388</spage><epage>158406</epage><pages>158388-158406</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>Activity recognition from multiple sensors is a promising research area with various applications for remote human activity tracking in surveillance systems. Human activity recognition (HAR) aims to identify human actions and assign descriptors using diverse data modalities such as skeleton, RGB, depth, infrared, inertial, audio, Wi-Fi, and radar. This paper introduces a novel HAR system for multi-sensor surveillance, incorporating RGB, RGB-D, and inertial sensors. The process involves framing and segmenting multi-sensor data, reducing noise and inconsistencies through filtration, and extracting novel features, which are then transformed into a matrix. The novel features include dynamic likelihood random field (DLRF), angle along sagittal plane (ASP), Lagregression (LR), and Gammatone cepstral coefficients (GCC), respectively. Additionally, a genetic algorithm is utilized to merge and refine this matrix by eliminating redundant information. The fused data is finally classified with an improved Convolutional Neural Network - Gated Recurrent Unit (CNNGRU) classifier to recognize specific human actions. Experimental evaluation using the leave-one-subject-out (LOSO) cross-validation on Berkeley-MHAD, HWU-USP, UTD-MHAD, NTU-RGB+D60, and NTU-RGB+D120 benchmark datasets demonstrates that the proposed system outperforms existing state-of-the-art techniques with the accuracy of 97.91%, 97.99%, 97.90%, 96.61%, and 95.94% respectively.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2024.3481631</doi><tpages>19</tpages><orcidid>https://orcid.org/0000-0002-2134-5388</orcidid><orcidid>https://orcid.org/0009-0000-8421-8477</orcidid><orcidid>https://orcid.org/0000-0002-0074-8153</orcidid><orcidid>https://orcid.org/0000-0003-2482-1869</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2169-3536
ispartof IEEE access, 2024, Vol.12, p.158388-158406
issn 2169-3536
2169-3536
language eng
recordid cdi_crossref_primary_10_1109_ACCESS_2024_3481631
source IEEE Open Access Journals; DOAJ Directory of Open Access Journals; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals
subjects Accuracy
Artificial neural networks
Audio data
Computational modeling
Convolutional neural network
Convolutional neural networks
Deep learning
depth camera
Face recognition
Feature extraction
Fields (mathematics)
Genetic algorithms
human action recognition
Human activity recognition
Inertial sensing devices
inertial sensors
Infrared tracking
multi-sensors
RGB
Sensors
Surveillance
Surveillance radar
Surveillance systems
Wearable sensors
title Multimodal Human Action Recognition Framework Using an Improved CNNGRU Classifier
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-09T07%3A12%3A58IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Multimodal%20Human%20Action%20Recognition%20Framework%20Using%20an%20Improved%20CNNGRU%20Classifier&rft.jtitle=IEEE%20access&rft.au=Batool,%20Mouazma&rft.date=2024&rft.volume=12&rft.spage=158388&rft.epage=158406&rft.pages=158388-158406&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2024.3481631&rft_dat=%3Cproquest_cross%3E3123123349%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3123123349&rft_id=info:pmid/&rft_ieee_id=10719991&rft_doaj_id=oai_doaj_org_article_d7c235cdc17646caae5b4a44113c5dd4&rfr_iscdi=true