U-ActionNet: Dual-Pathway Fourier Networks With Region-of-Interest Module for Efficient Action Recognition in UAV Surveillance

Unmanned Aerial Vehicles (UAV) have revolutionized human action recognition by offering a bird's-eye perspective, thereby unlocking unprecedented potential for comprehensive support within surveillance systems. This paper presents synergistic strategies for enhancing Human Action Recognition (H...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE access 2024, Vol.12, p.189547-189563
Hauptverfasser:	Monaf Chowdhury, Abdul, Imran, Ahsan, Hasan, Md Mehedi, Ahmed, Riad, Azad, Akm, Alyami, Salem A.
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy action recognition Attention Autonomous aerial vehicles Cameras Computational efficiency Computational modeling Datasets Deep learning Drones Effectiveness Fast Fourier transformations Fast Fourier transforms FFT Fourier transforms Human activity recognition Human motion Modules Motion perception optimization Real time Servers Surveillance Surveillance systems Target recognition Task complexity tracking Unmanned aerial vehicles video surveillance
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	189563
container_issue
container_start_page	189547
container_title	IEEE access
container_volume	12
creator	Monaf Chowdhury, Abdul Imran, Ahsan Hasan, Md Mehedi Ahmed, Riad Azad, Akm Alyami, Salem A.
description	Unmanned Aerial Vehicles (UAV) have revolutionized human action recognition by offering a bird's-eye perspective, thereby unlocking unprecedented potential for comprehensive support within surveillance systems. This paper presents synergistic strategies for enhancing Human Action Recognition (HAR) in UAV imagery. Leveraging a dual-path approach, we propose a novel framework U-ActionNet that integrates FFT-based substance separation and space-time self-attention techniques to improve the accuracy and efficiency of HAR tasks. Catering the server side the first pathway employs a modified C3D model with Fast Fourier Transform (FFT)-based object movement and attention detection mechanisms to effectively extract human actors from complex scenes and capture spatiotemporal dynamics in UAV footage. Moreover, a generalized Region-of-Interest (ROI) module is utilized to concentrate on optimal regions to enhance target recognition. Through extensive experiments on the Drone Action and UAV-human datasets, we demonstrate the effectiveness of our approach, achieving superior performance compared to the state-of-the-art methods with Top-1 accuracies of 94.94% and 95.05%, respectively. Meanwhile, the second pathway employs edge-optimized models, integrating ROI extraction-based frame sampling technique to eliminate static frames while preserving pivotal frames essential for model training. We introduce a lightweight model named U-ActionNet Light which is a combination of MobileNetV2, Fourier module, and BiLSTM models that demand only one-ninth of the parameters compared to the server-side model. Demonstrating its efficacy, this model attains Top-1 accuracies of 80.43% and 84.74% on the same datasets, surpassing the baseline by a significant margin. Experimental results show that the presented frameworks are promising for surveillance, search and rescue, and activity monitoring, where accurate and real-time human action recognition from UAV platforms is essential.
doi_str_mv	10.1109/ACCESS.2024.3516586
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1109_ACCESS_2024_3516586</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10794748</ieee_id><doaj_id>oai_doaj_org_article_40529994439046f383e40c92ebcb6d30</doaj_id><sourcerecordid>3147528377</sourcerecordid><originalsourceid>FETCH-LOGICAL-c244t-bb0a3a7505ee6d49c27032cc62d94bdd54b9bc910a2aa81b9a7a93099172bb263</originalsourceid><addsrcrecordid>eNpNUU1P3DAQjapWKgJ-QXuw1HMWfzvubbVdykr0QyzbHi3bmSzepjF1HBCX_vYagirmMqOZ996M5lXVO4IXhGB9tlyt1tvtgmLKF0wQKRr5qjqiROqaCSZfv6jfVqfjeMAlmtIS6qj6u6uXPoc4fIX8EX2abF9_t_nm3j6g8zilAAmVyX1Mv0b0M-QbdAX7gq5jV2-GDAnGjL7EduoBdTGhddcFH2DIaFYtcB_3Q3iqw4B2yx9oO6U7CH1vBw8n1ZvO9iOcPufjane-vl5d1JffPm9Wy8vaU85z7Ry2zCqBBYBsufZUYUa9l7TV3LWt4E47rwm21NqGOG2V1QxrTRR1jkp2XG1m3Tbag7lN4bdNDybaYJ4aMe2NTTn4HgzHgmqtOWcac9mxhgHHXlNw3smW4aL1Yda6TfHPVB5gDuVTQznfMMKVoA1TqqDYjPIpjmOC7v9Wgs2jb2b2zTz6Zp59K6z3MysAwAuG0lzxhv0D9HeTwA</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3147528377</pqid></control><display><type>article</type><title>U-ActionNet: Dual-Pathway Fourier Networks With Region-of-Interest Module for Efficient Action Recognition in UAV Surveillance</title><source>IEEE Open Access Journals</source><source>DOAJ Directory of Open Access Journals</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><creator>Monaf Chowdhury, Abdul ; Imran, Ahsan ; Hasan, Md Mehedi ; Ahmed, Riad ; Azad, Akm ; Alyami, Salem A.</creator><creatorcontrib>Monaf Chowdhury, Abdul ; Imran, Ahsan ; Hasan, Md Mehedi ; Ahmed, Riad ; Azad, Akm ; Alyami, Salem A.</creatorcontrib><description>Unmanned Aerial Vehicles (UAV) have revolutionized human action recognition by offering a bird's-eye perspective, thereby unlocking unprecedented potential for comprehensive support within surveillance systems. This paper presents synergistic strategies for enhancing Human Action Recognition (HAR) in UAV imagery. Leveraging a dual-path approach, we propose a novel framework U-ActionNet that integrates FFT-based substance separation and space-time self-attention techniques to improve the accuracy and efficiency of HAR tasks. Catering the server side the first pathway employs a modified C3D model with Fast Fourier Transform (FFT)-based object movement and attention detection mechanisms to effectively extract human actors from complex scenes and capture spatiotemporal dynamics in UAV footage. Moreover, a generalized Region-of-Interest (ROI) module is utilized to concentrate on optimal regions to enhance target recognition. Through extensive experiments on the Drone Action and UAV-human datasets, we demonstrate the effectiveness of our approach, achieving superior performance compared to the state-of-the-art methods with Top-1 accuracies of 94.94% and 95.05%, respectively. Meanwhile, the second pathway employs edge-optimized models, integrating ROI extraction-based frame sampling technique to eliminate static frames while preserving pivotal frames essential for model training. We introduce a lightweight model named U-ActionNet Light which is a combination of MobileNetV2, Fourier module, and BiLSTM models that demand only one-ninth of the parameters compared to the server-side model. Demonstrating its efficacy, this model attains Top-1 accuracies of 80.43% and 84.74% on the same datasets, surpassing the baseline by a significant margin. Experimental results show that the presented frameworks are promising for surveillance, search and rescue, and activity monitoring, where accurate and real-time human action recognition from UAV platforms is essential.</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2024.3516586</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Accuracy ; action recognition ; Attention ; Autonomous aerial vehicles ; Cameras ; Computational efficiency ; Computational modeling ; Datasets ; Deep learning ; Drones ; Effectiveness ; Fast Fourier transformations ; Fast Fourier transforms ; FFT ; Fourier transforms ; Human activity recognition ; Human motion ; Modules ; Motion perception ; optimization ; Real time ; Servers ; Surveillance ; Surveillance systems ; Target recognition ; Task complexity ; tracking ; Unmanned aerial vehicles ; video surveillance</subject><ispartof>IEEE access, 2024, Vol.12, p.189547-189563</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c244t-bb0a3a7505ee6d49c27032cc62d94bdd54b9bc910a2aa81b9a7a93099172bb263</cites><orcidid>0000-0002-5251-2214 ; 0009-0007-2035-7284 ; 0009-0004-8465-6142 ; 0000-0002-5400-7166 ; 0000-0002-5507-9399 ; 0000-0003-4326-4332</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10794748$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,776,780,860,2096,4010,27610,27900,27901,27902,54908</link.rule.ids></links><search><creatorcontrib>Monaf Chowdhury, Abdul</creatorcontrib><creatorcontrib>Imran, Ahsan</creatorcontrib><creatorcontrib>Hasan, Md Mehedi</creatorcontrib><creatorcontrib>Ahmed, Riad</creatorcontrib><creatorcontrib>Azad, Akm</creatorcontrib><creatorcontrib>Alyami, Salem A.</creatorcontrib><title>U-ActionNet: Dual-Pathway Fourier Networks With Region-of-Interest Module for Efficient Action Recognition in UAV Surveillance</title><title>IEEE access</title><addtitle>Access</addtitle><description>Unmanned Aerial Vehicles (UAV) have revolutionized human action recognition by offering a bird's-eye perspective, thereby unlocking unprecedented potential for comprehensive support within surveillance systems. This paper presents synergistic strategies for enhancing Human Action Recognition (HAR) in UAV imagery. Leveraging a dual-path approach, we propose a novel framework U-ActionNet that integrates FFT-based substance separation and space-time self-attention techniques to improve the accuracy and efficiency of HAR tasks. Catering the server side the first pathway employs a modified C3D model with Fast Fourier Transform (FFT)-based object movement and attention detection mechanisms to effectively extract human actors from complex scenes and capture spatiotemporal dynamics in UAV footage. Moreover, a generalized Region-of-Interest (ROI) module is utilized to concentrate on optimal regions to enhance target recognition. Through extensive experiments on the Drone Action and UAV-human datasets, we demonstrate the effectiveness of our approach, achieving superior performance compared to the state-of-the-art methods with Top-1 accuracies of 94.94% and 95.05%, respectively. Meanwhile, the second pathway employs edge-optimized models, integrating ROI extraction-based frame sampling technique to eliminate static frames while preserving pivotal frames essential for model training. We introduce a lightweight model named U-ActionNet Light which is a combination of MobileNetV2, Fourier module, and BiLSTM models that demand only one-ninth of the parameters compared to the server-side model. Demonstrating its efficacy, this model attains Top-1 accuracies of 80.43% and 84.74% on the same datasets, surpassing the baseline by a significant margin. Experimental results show that the presented frameworks are promising for surveillance, search and rescue, and activity monitoring, where accurate and real-time human action recognition from UAV platforms is essential.</description><subject>Accuracy</subject><subject>action recognition</subject><subject>Attention</subject><subject>Autonomous aerial vehicles</subject><subject>Cameras</subject><subject>Computational efficiency</subject><subject>Computational modeling</subject><subject>Datasets</subject><subject>Deep learning</subject><subject>Drones</subject><subject>Effectiveness</subject><subject>Fast Fourier transformations</subject><subject>Fast Fourier transforms</subject><subject>FFT</subject><subject>Fourier transforms</subject><subject>Human activity recognition</subject><subject>Human motion</subject><subject>Modules</subject><subject>Motion perception</subject><subject>optimization</subject><subject>Real time</subject><subject>Servers</subject><subject>Surveillance</subject><subject>Surveillance systems</subject><subject>Target recognition</subject><subject>Task complexity</subject><subject>tracking</subject><subject>Unmanned aerial vehicles</subject><subject>video surveillance</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><sourceid>DOA</sourceid><recordid>eNpNUU1P3DAQjapWKgJ-QXuw1HMWfzvubbVdykr0QyzbHi3bmSzepjF1HBCX_vYagirmMqOZ996M5lXVO4IXhGB9tlyt1tvtgmLKF0wQKRr5qjqiROqaCSZfv6jfVqfjeMAlmtIS6qj6u6uXPoc4fIX8EX2abF9_t_nm3j6g8zilAAmVyX1Mv0b0M-QbdAX7gq5jV2-GDAnGjL7EduoBdTGhddcFH2DIaFYtcB_3Q3iqw4B2yx9oO6U7CH1vBw8n1ZvO9iOcPufjane-vl5d1JffPm9Wy8vaU85z7Ry2zCqBBYBsufZUYUa9l7TV3LWt4E47rwm21NqGOG2V1QxrTRR1jkp2XG1m3Tbag7lN4bdNDybaYJ4aMe2NTTn4HgzHgmqtOWcac9mxhgHHXlNw3smW4aL1Yda6TfHPVB5gDuVTQznfMMKVoA1TqqDYjPIpjmOC7v9Wgs2jb2b2zTz6Zp59K6z3MysAwAuG0lzxhv0D9HeTwA</recordid><startdate>2024</startdate><enddate>2024</enddate><creator>Monaf Chowdhury, Abdul</creator><creator>Imran, Ahsan</creator><creator>Hasan, Md Mehedi</creator><creator>Ahmed, Riad</creator><creator>Azad, Akm</creator><creator>Alyami, Salem A.</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0002-5251-2214</orcidid><orcidid>https://orcid.org/0009-0007-2035-7284</orcidid><orcidid>https://orcid.org/0009-0004-8465-6142</orcidid><orcidid>https://orcid.org/0000-0002-5400-7166</orcidid><orcidid>https://orcid.org/0000-0002-5507-9399</orcidid><orcidid>https://orcid.org/0000-0003-4326-4332</orcidid></search><sort><creationdate>2024</creationdate><title>U-ActionNet: Dual-Pathway Fourier Networks With Region-of-Interest Module for Efficient Action Recognition in UAV Surveillance</title><author>Monaf Chowdhury, Abdul ; Imran, Ahsan ; Hasan, Md Mehedi ; Ahmed, Riad ; Azad, Akm ; Alyami, Salem A.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c244t-bb0a3a7505ee6d49c27032cc62d94bdd54b9bc910a2aa81b9a7a93099172bb263</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Accuracy</topic><topic>action recognition</topic><topic>Attention</topic><topic>Autonomous aerial vehicles</topic><topic>Cameras</topic><topic>Computational efficiency</topic><topic>Computational modeling</topic><topic>Datasets</topic><topic>Deep learning</topic><topic>Drones</topic><topic>Effectiveness</topic><topic>Fast Fourier transformations</topic><topic>Fast Fourier transforms</topic><topic>FFT</topic><topic>Fourier transforms</topic><topic>Human activity recognition</topic><topic>Human motion</topic><topic>Modules</topic><topic>Motion perception</topic><topic>optimization</topic><topic>Real time</topic><topic>Servers</topic><topic>Surveillance</topic><topic>Surveillance systems</topic><topic>Target recognition</topic><topic>Task complexity</topic><topic>tracking</topic><topic>Unmanned aerial vehicles</topic><topic>video surveillance</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Monaf Chowdhury, Abdul</creatorcontrib><creatorcontrib>Imran, Ahsan</creatorcontrib><creatorcontrib>Hasan, Md Mehedi</creatorcontrib><creatorcontrib>Ahmed, Riad</creatorcontrib><creatorcontrib>Azad, Akm</creatorcontrib><creatorcontrib>Alyami, Salem A.</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Monaf Chowdhury, Abdul</au><au>Imran, Ahsan</au><au>Hasan, Md Mehedi</au><au>Ahmed, Riad</au><au>Azad, Akm</au><au>Alyami, Salem A.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>U-ActionNet: Dual-Pathway Fourier Networks With Region-of-Interest Module for Efficient Action Recognition in UAV Surveillance</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2024</date><risdate>2024</risdate><volume>12</volume><spage>189547</spage><epage>189563</epage><pages>189547-189563</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>Unmanned Aerial Vehicles (UAV) have revolutionized human action recognition by offering a bird's-eye perspective, thereby unlocking unprecedented potential for comprehensive support within surveillance systems. This paper presents synergistic strategies for enhancing Human Action Recognition (HAR) in UAV imagery. Leveraging a dual-path approach, we propose a novel framework U-ActionNet that integrates FFT-based substance separation and space-time self-attention techniques to improve the accuracy and efficiency of HAR tasks. Catering the server side the first pathway employs a modified C3D model with Fast Fourier Transform (FFT)-based object movement and attention detection mechanisms to effectively extract human actors from complex scenes and capture spatiotemporal dynamics in UAV footage. Moreover, a generalized Region-of-Interest (ROI) module is utilized to concentrate on optimal regions to enhance target recognition. Through extensive experiments on the Drone Action and UAV-human datasets, we demonstrate the effectiveness of our approach, achieving superior performance compared to the state-of-the-art methods with Top-1 accuracies of 94.94% and 95.05%, respectively. Meanwhile, the second pathway employs edge-optimized models, integrating ROI extraction-based frame sampling technique to eliminate static frames while preserving pivotal frames essential for model training. We introduce a lightweight model named U-ActionNet Light which is a combination of MobileNetV2, Fourier module, and BiLSTM models that demand only one-ninth of the parameters compared to the server-side model. Demonstrating its efficacy, this model attains Top-1 accuracies of 80.43% and 84.74% on the same datasets, surpassing the baseline by a significant margin. Experimental results show that the presented frameworks are promising for surveillance, search and rescue, and activity monitoring, where accurate and real-time human action recognition from UAV platforms is essential.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2024.3516586</doi><tpages>17</tpages><orcidid>https://orcid.org/0000-0002-5251-2214</orcidid><orcidid>https://orcid.org/0009-0007-2035-7284</orcidid><orcidid>https://orcid.org/0009-0004-8465-6142</orcidid><orcidid>https://orcid.org/0000-0002-5400-7166</orcidid><orcidid>https://orcid.org/0000-0002-5507-9399</orcidid><orcidid>https://orcid.org/0000-0003-4326-4332</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 2169-3536
ispartof	IEEE access, 2024, Vol.12, p.189547-189563
issn	2169-3536 2169-3536
language	eng
recordid	cdi_crossref_primary_10_1109_ACCESS_2024_3516586
source	IEEE Open Access Journals; DOAJ Directory of Open Access Journals; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals
subjects	Accuracy action recognition Attention Autonomous aerial vehicles Cameras Computational efficiency Computational modeling Datasets Deep learning Drones Effectiveness Fast Fourier transformations Fast Fourier transforms FFT Fourier transforms Human activity recognition Human motion Modules Motion perception optimization Real time Servers Surveillance Surveillance systems Target recognition Task complexity tracking Unmanned aerial vehicles video surveillance
title	U-ActionNet: Dual-Pathway Fourier Networks With Region-of-Interest Module for Efficient Action Recognition in UAV Surveillance
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-09T06%3A35%3A46IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=U-ActionNet:%20Dual-Pathway%20Fourier%20Networks%20With%20Region-of-Interest%20Module%20for%20Efficient%20Action%20Recognition%20in%20UAV%20Surveillance&rft.jtitle=IEEE%20access&rft.au=Monaf%20Chowdhury,%20Abdul&rft.date=2024&rft.volume=12&rft.spage=189547&rft.epage=189563&rft.pages=189547-189563&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2024.3516586&rft_dat=%3Cproquest_cross%3E3147528377%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3147528377&rft_id=info:pmid/&rft_ieee_id=10794748&rft_doaj_id=oai_doaj_org_article_40529994439046f383e40c92ebcb6d30&rfr_iscdi=true