Multimodal Human Action Recognition Framework Using an Improved CNNGRU Classifier
Activity recognition from multiple sensors is a promising research area with various applications for remote human activity tracking in surveillance systems. Human activity recognition (HAR) aims to identify human actions and assign descriptors using diverse data modalities such as skeleton, RGB, de...
Gespeichert in:
Veröffentlicht in: | IEEE access 2024, Vol.12, p.158388-158406 |
---|---|
Hauptverfasser: | , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 158406 |
---|---|
container_issue | |
container_start_page | 158388 |
container_title | IEEE access |
container_volume | 12 |
creator | Batool, Mouazma Alotaibi, Moneerah Alotaibi, Sultan Refa Alhammadi, Dina Abdulaziz Jamal, Muhammad Asif Jalal, Ahmad Lee, Bumshik |
description | Activity recognition from multiple sensors is a promising research area with various applications for remote human activity tracking in surveillance systems. Human activity recognition (HAR) aims to identify human actions and assign descriptors using diverse data modalities such as skeleton, RGB, depth, infrared, inertial, audio, Wi-Fi, and radar. This paper introduces a novel HAR system for multi-sensor surveillance, incorporating RGB, RGB-D, and inertial sensors. The process involves framing and segmenting multi-sensor data, reducing noise and inconsistencies through filtration, and extracting novel features, which are then transformed into a matrix. The novel features include dynamic likelihood random field (DLRF), angle along sagittal plane (ASP), Lagregression (LR), and Gammatone cepstral coefficients (GCC), respectively. Additionally, a genetic algorithm is utilized to merge and refine this matrix by eliminating redundant information. The fused data is finally classified with an improved Convolutional Neural Network - Gated Recurrent Unit (CNNGRU) classifier to recognize specific human actions. Experimental evaluation using the leave-one-subject-out (LOSO) cross-validation on Berkeley-MHAD, HWU-USP, UTD-MHAD, NTU-RGB+D60, and NTU-RGB+D120 benchmark datasets demonstrates that the proposed system outperforms existing state-of-the-art techniques with the accuracy of 97.91%, 97.99%, 97.90%, 96.61%, and 95.94% respectively. |
doi_str_mv | 10.1109/ACCESS.2024.3481631 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1109_ACCESS_2024_3481631</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10719991</ieee_id><doaj_id>oai_doaj_org_article_d7c235cdc17646caae5b4a44113c5dd4</doaj_id><sourcerecordid>3123123349</sourcerecordid><originalsourceid>FETCH-LOGICAL-c289t-dde3eabaa6c5cf0d889fdb21d5db6a0673a47f4359a4d3cc15f7b8053286664b3</originalsourceid><addsrcrecordid>eNpNUdtqwzAMDWODla5fsD0E9twuji-JH0voDbqOteuzUWynuEvizk439vdLmzIqBBLinCOJEwSPKBohFPGXcZZNNptRHMVkhEmKGEY3QS9GjA8xxez2qr8PBt7vozZaGKdJL3h_PZaNqayCMpwfK6jDsWyMrcO1lnZXm3M_dVDpH-s-w6039S5sUYvq4Oy3VmG2Ws3W2zArwXtTGO0egrsCSq8Hl9oPttPJRzYfLt9mi2y8HMo45c1QKY015ABMUllEKk15ofIYKapyBhFLMJCkIJhyIApLiWiR5GlEcZwyxkiO-8Gi01UW9uLgTAXuV1gw4jywbifANUaWWqhExphKJVHCCJMAmuYECEEIS6oUabWeO632qa-j9o3Y26Or2_MFRvEpMeEtCnco6az3Thf_W1EkTlaIzgpxskJcrGhZTx3LaK2vGAninCP8B8-jhOE</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3123123349</pqid></control><display><type>article</type><title>Multimodal Human Action Recognition Framework Using an Improved CNNGRU Classifier</title><source>IEEE Open Access Journals</source><source>DOAJ Directory of Open Access Journals</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><creator>Batool, Mouazma ; Alotaibi, Moneerah ; Alotaibi, Sultan Refa ; Alhammadi, Dina Abdulaziz ; Jamal, Muhammad Asif ; Jalal, Ahmad ; Lee, Bumshik</creator><creatorcontrib>Batool, Mouazma ; Alotaibi, Moneerah ; Alotaibi, Sultan Refa ; Alhammadi, Dina Abdulaziz ; Jamal, Muhammad Asif ; Jalal, Ahmad ; Lee, Bumshik</creatorcontrib><description>Activity recognition from multiple sensors is a promising research area with various applications for remote human activity tracking in surveillance systems. Human activity recognition (HAR) aims to identify human actions and assign descriptors using diverse data modalities such as skeleton, RGB, depth, infrared, inertial, audio, Wi-Fi, and radar. This paper introduces a novel HAR system for multi-sensor surveillance, incorporating RGB, RGB-D, and inertial sensors. The process involves framing and segmenting multi-sensor data, reducing noise and inconsistencies through filtration, and extracting novel features, which are then transformed into a matrix. The novel features include dynamic likelihood random field (DLRF), angle along sagittal plane (ASP), Lagregression (LR), and Gammatone cepstral coefficients (GCC), respectively. Additionally, a genetic algorithm is utilized to merge and refine this matrix by eliminating redundant information. The fused data is finally classified with an improved Convolutional Neural Network - Gated Recurrent Unit (CNNGRU) classifier to recognize specific human actions. Experimental evaluation using the leave-one-subject-out (LOSO) cross-validation on Berkeley-MHAD, HWU-USP, UTD-MHAD, NTU-RGB+D60, and NTU-RGB+D120 benchmark datasets demonstrates that the proposed system outperforms existing state-of-the-art techniques with the accuracy of 97.91%, 97.99%, 97.90%, 96.61%, and 95.94% respectively.</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2024.3481631</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Accuracy ; Artificial neural networks ; Audio data ; Computational modeling ; Convolutional neural network ; Convolutional neural networks ; Deep learning ; depth camera ; Face recognition ; Feature extraction ; Fields (mathematics) ; Genetic algorithms ; human action recognition ; Human activity recognition ; Inertial sensing devices ; inertial sensors ; Infrared tracking ; multi-sensors ; RGB ; Sensors ; Surveillance ; Surveillance radar ; Surveillance systems ; Wearable sensors</subject><ispartof>IEEE access, 2024, Vol.12, p.158388-158406</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c289t-dde3eabaa6c5cf0d889fdb21d5db6a0673a47f4359a4d3cc15f7b8053286664b3</cites><orcidid>0000-0002-2134-5388 ; 0009-0000-8421-8477 ; 0000-0002-0074-8153 ; 0000-0003-2482-1869</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10719991$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,776,780,860,2096,4010,27610,27900,27901,27902,54908</link.rule.ids></links><search><creatorcontrib>Batool, Mouazma</creatorcontrib><creatorcontrib>Alotaibi, Moneerah</creatorcontrib><creatorcontrib>Alotaibi, Sultan Refa</creatorcontrib><creatorcontrib>Alhammadi, Dina Abdulaziz</creatorcontrib><creatorcontrib>Jamal, Muhammad Asif</creatorcontrib><creatorcontrib>Jalal, Ahmad</creatorcontrib><creatorcontrib>Lee, Bumshik</creatorcontrib><title>Multimodal Human Action Recognition Framework Using an Improved CNNGRU Classifier</title><title>IEEE access</title><addtitle>Access</addtitle><description>Activity recognition from multiple sensors is a promising research area with various applications for remote human activity tracking in surveillance systems. Human activity recognition (HAR) aims to identify human actions and assign descriptors using diverse data modalities such as skeleton, RGB, depth, infrared, inertial, audio, Wi-Fi, and radar. This paper introduces a novel HAR system for multi-sensor surveillance, incorporating RGB, RGB-D, and inertial sensors. The process involves framing and segmenting multi-sensor data, reducing noise and inconsistencies through filtration, and extracting novel features, which are then transformed into a matrix. The novel features include dynamic likelihood random field (DLRF), angle along sagittal plane (ASP), Lagregression (LR), and Gammatone cepstral coefficients (GCC), respectively. Additionally, a genetic algorithm is utilized to merge and refine this matrix by eliminating redundant information. The fused data is finally classified with an improved Convolutional Neural Network - Gated Recurrent Unit (CNNGRU) classifier to recognize specific human actions. Experimental evaluation using the leave-one-subject-out (LOSO) cross-validation on Berkeley-MHAD, HWU-USP, UTD-MHAD, NTU-RGB+D60, and NTU-RGB+D120 benchmark datasets demonstrates that the proposed system outperforms existing state-of-the-art techniques with the accuracy of 97.91%, 97.99%, 97.90%, 96.61%, and 95.94% respectively.</description><subject>Accuracy</subject><subject>Artificial neural networks</subject><subject>Audio data</subject><subject>Computational modeling</subject><subject>Convolutional neural network</subject><subject>Convolutional neural networks</subject><subject>Deep learning</subject><subject>depth camera</subject><subject>Face recognition</subject><subject>Feature extraction</subject><subject>Fields (mathematics)</subject><subject>Genetic algorithms</subject><subject>human action recognition</subject><subject>Human activity recognition</subject><subject>Inertial sensing devices</subject><subject>inertial sensors</subject><subject>Infrared tracking</subject><subject>multi-sensors</subject><subject>RGB</subject><subject>Sensors</subject><subject>Surveillance</subject><subject>Surveillance radar</subject><subject>Surveillance systems</subject><subject>Wearable sensors</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><sourceid>DOA</sourceid><recordid>eNpNUdtqwzAMDWODla5fsD0E9twuji-JH0voDbqOteuzUWynuEvizk439vdLmzIqBBLinCOJEwSPKBohFPGXcZZNNptRHMVkhEmKGEY3QS9GjA8xxez2qr8PBt7vozZaGKdJL3h_PZaNqayCMpwfK6jDsWyMrcO1lnZXm3M_dVDpH-s-w6039S5sUYvq4Oy3VmG2Ws3W2zArwXtTGO0egrsCSq8Hl9oPttPJRzYfLt9mi2y8HMo45c1QKY015ABMUllEKk15ofIYKapyBhFLMJCkIJhyIApLiWiR5GlEcZwyxkiO-8Gi01UW9uLgTAXuV1gw4jywbifANUaWWqhExphKJVHCCJMAmuYECEEIS6oUabWeO632qa-j9o3Y26Or2_MFRvEpMeEtCnco6az3Thf_W1EkTlaIzgpxskJcrGhZTx3LaK2vGAninCP8B8-jhOE</recordid><startdate>2024</startdate><enddate>2024</enddate><creator>Batool, Mouazma</creator><creator>Alotaibi, Moneerah</creator><creator>Alotaibi, Sultan Refa</creator><creator>Alhammadi, Dina Abdulaziz</creator><creator>Jamal, Muhammad Asif</creator><creator>Jalal, Ahmad</creator><creator>Lee, Bumshik</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0002-2134-5388</orcidid><orcidid>https://orcid.org/0009-0000-8421-8477</orcidid><orcidid>https://orcid.org/0000-0002-0074-8153</orcidid><orcidid>https://orcid.org/0000-0003-2482-1869</orcidid></search><sort><creationdate>2024</creationdate><title>Multimodal Human Action Recognition Framework Using an Improved CNNGRU Classifier</title><author>Batool, Mouazma ; Alotaibi, Moneerah ; Alotaibi, Sultan Refa ; Alhammadi, Dina Abdulaziz ; Jamal, Muhammad Asif ; Jalal, Ahmad ; Lee, Bumshik</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c289t-dde3eabaa6c5cf0d889fdb21d5db6a0673a47f4359a4d3cc15f7b8053286664b3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Accuracy</topic><topic>Artificial neural networks</topic><topic>Audio data</topic><topic>Computational modeling</topic><topic>Convolutional neural network</topic><topic>Convolutional neural networks</topic><topic>Deep learning</topic><topic>depth camera</topic><topic>Face recognition</topic><topic>Feature extraction</topic><topic>Fields (mathematics)</topic><topic>Genetic algorithms</topic><topic>human action recognition</topic><topic>Human activity recognition</topic><topic>Inertial sensing devices</topic><topic>inertial sensors</topic><topic>Infrared tracking</topic><topic>multi-sensors</topic><topic>RGB</topic><topic>Sensors</topic><topic>Surveillance</topic><topic>Surveillance radar</topic><topic>Surveillance systems</topic><topic>Wearable sensors</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Batool, Mouazma</creatorcontrib><creatorcontrib>Alotaibi, Moneerah</creatorcontrib><creatorcontrib>Alotaibi, Sultan Refa</creatorcontrib><creatorcontrib>Alhammadi, Dina Abdulaziz</creatorcontrib><creatorcontrib>Jamal, Muhammad Asif</creatorcontrib><creatorcontrib>Jalal, Ahmad</creatorcontrib><creatorcontrib>Lee, Bumshik</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Batool, Mouazma</au><au>Alotaibi, Moneerah</au><au>Alotaibi, Sultan Refa</au><au>Alhammadi, Dina Abdulaziz</au><au>Jamal, Muhammad Asif</au><au>Jalal, Ahmad</au><au>Lee, Bumshik</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Multimodal Human Action Recognition Framework Using an Improved CNNGRU Classifier</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2024</date><risdate>2024</risdate><volume>12</volume><spage>158388</spage><epage>158406</epage><pages>158388-158406</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>Activity recognition from multiple sensors is a promising research area with various applications for remote human activity tracking in surveillance systems. Human activity recognition (HAR) aims to identify human actions and assign descriptors using diverse data modalities such as skeleton, RGB, depth, infrared, inertial, audio, Wi-Fi, and radar. This paper introduces a novel HAR system for multi-sensor surveillance, incorporating RGB, RGB-D, and inertial sensors. The process involves framing and segmenting multi-sensor data, reducing noise and inconsistencies through filtration, and extracting novel features, which are then transformed into a matrix. The novel features include dynamic likelihood random field (DLRF), angle along sagittal plane (ASP), Lagregression (LR), and Gammatone cepstral coefficients (GCC), respectively. Additionally, a genetic algorithm is utilized to merge and refine this matrix by eliminating redundant information. The fused data is finally classified with an improved Convolutional Neural Network - Gated Recurrent Unit (CNNGRU) classifier to recognize specific human actions. Experimental evaluation using the leave-one-subject-out (LOSO) cross-validation on Berkeley-MHAD, HWU-USP, UTD-MHAD, NTU-RGB+D60, and NTU-RGB+D120 benchmark datasets demonstrates that the proposed system outperforms existing state-of-the-art techniques with the accuracy of 97.91%, 97.99%, 97.90%, 96.61%, and 95.94% respectively.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2024.3481631</doi><tpages>19</tpages><orcidid>https://orcid.org/0000-0002-2134-5388</orcidid><orcidid>https://orcid.org/0009-0000-8421-8477</orcidid><orcidid>https://orcid.org/0000-0002-0074-8153</orcidid><orcidid>https://orcid.org/0000-0003-2482-1869</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 2169-3536 |
ispartof | IEEE access, 2024, Vol.12, p.158388-158406 |
issn | 2169-3536 2169-3536 |
language | eng |
recordid | cdi_crossref_primary_10_1109_ACCESS_2024_3481631 |
source | IEEE Open Access Journals; DOAJ Directory of Open Access Journals; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals |
subjects | Accuracy Artificial neural networks Audio data Computational modeling Convolutional neural network Convolutional neural networks Deep learning depth camera Face recognition Feature extraction Fields (mathematics) Genetic algorithms human action recognition Human activity recognition Inertial sensing devices inertial sensors Infrared tracking multi-sensors RGB Sensors Surveillance Surveillance radar Surveillance systems Wearable sensors |
title | Multimodal Human Action Recognition Framework Using an Improved CNNGRU Classifier |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-09T07%3A12%3A58IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Multimodal%20Human%20Action%20Recognition%20Framework%20Using%20an%20Improved%20CNNGRU%20Classifier&rft.jtitle=IEEE%20access&rft.au=Batool,%20Mouazma&rft.date=2024&rft.volume=12&rft.spage=158388&rft.epage=158406&rft.pages=158388-158406&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2024.3481631&rft_dat=%3Cproquest_cross%3E3123123349%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3123123349&rft_id=info:pmid/&rft_ieee_id=10719991&rft_doaj_id=oai_doaj_org_article_d7c235cdc17646caae5b4a44113c5dd4&rfr_iscdi=true |