U-ActionNet: Dual-Pathway Fourier Networks With Region-of-Interest Module for Efficient Action Recognition in UAV Surveillance

Unmanned Aerial Vehicles (UAV) have revolutionized human action recognition by offering a bird's-eye perspective, thereby unlocking unprecedented potential for comprehensive support within surveillance systems. This paper presents synergistic strategies for enhancing Human Action Recognition (H...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE access 2024, Vol.12, p.189547-189563
Hauptverfasser: Monaf Chowdhury, Abdul, Imran, Ahsan, Hasan, Md Mehedi, Ahmed, Riad, Azad, Akm, Alyami, Salem A.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 189563
container_issue
container_start_page 189547
container_title IEEE access
container_volume 12
creator Monaf Chowdhury, Abdul
Imran, Ahsan
Hasan, Md Mehedi
Ahmed, Riad
Azad, Akm
Alyami, Salem A.
description Unmanned Aerial Vehicles (UAV) have revolutionized human action recognition by offering a bird's-eye perspective, thereby unlocking unprecedented potential for comprehensive support within surveillance systems. This paper presents synergistic strategies for enhancing Human Action Recognition (HAR) in UAV imagery. Leveraging a dual-path approach, we propose a novel framework U-ActionNet that integrates FFT-based substance separation and space-time self-attention techniques to improve the accuracy and efficiency of HAR tasks. Catering the server side the first pathway employs a modified C3D model with Fast Fourier Transform (FFT)-based object movement and attention detection mechanisms to effectively extract human actors from complex scenes and capture spatiotemporal dynamics in UAV footage. Moreover, a generalized Region-of-Interest (ROI) module is utilized to concentrate on optimal regions to enhance target recognition. Through extensive experiments on the Drone Action and UAV-human datasets, we demonstrate the effectiveness of our approach, achieving superior performance compared to the state-of-the-art methods with Top-1 accuracies of 94.94% and 95.05%, respectively. Meanwhile, the second pathway employs edge-optimized models, integrating ROI extraction-based frame sampling technique to eliminate static frames while preserving pivotal frames essential for model training. We introduce a lightweight model named U-ActionNet Light which is a combination of MobileNetV2, Fourier module, and BiLSTM models that demand only one-ninth of the parameters compared to the server-side model. Demonstrating its efficacy, this model attains Top-1 accuracies of 80.43% and 84.74% on the same datasets, surpassing the baseline by a significant margin. Experimental results show that the presented frameworks are promising for surveillance, search and rescue, and activity monitoring, where accurate and real-time human action recognition from UAV platforms is essential.
doi_str_mv 10.1109/ACCESS.2024.3516586
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1109_ACCESS_2024_3516586</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10794748</ieee_id><doaj_id>oai_doaj_org_article_40529994439046f383e40c92ebcb6d30</doaj_id><sourcerecordid>3147528377</sourcerecordid><originalsourceid>FETCH-LOGICAL-c244t-bb0a3a7505ee6d49c27032cc62d94bdd54b9bc910a2aa81b9a7a93099172bb263</originalsourceid><addsrcrecordid>eNpNUU1P3DAQjapWKgJ-QXuw1HMWfzvubbVdykr0QyzbHi3bmSzepjF1HBCX_vYagirmMqOZ996M5lXVO4IXhGB9tlyt1tvtgmLKF0wQKRr5qjqiROqaCSZfv6jfVqfjeMAlmtIS6qj6u6uXPoc4fIX8EX2abF9_t_nm3j6g8zilAAmVyX1Mv0b0M-QbdAX7gq5jV2-GDAnGjL7EduoBdTGhddcFH2DIaFYtcB_3Q3iqw4B2yx9oO6U7CH1vBw8n1ZvO9iOcPufjane-vl5d1JffPm9Wy8vaU85z7Ry2zCqBBYBsufZUYUa9l7TV3LWt4E47rwm21NqGOG2V1QxrTRR1jkp2XG1m3Tbag7lN4bdNDybaYJ4aMe2NTTn4HgzHgmqtOWcac9mxhgHHXlNw3smW4aL1Yda6TfHPVB5gDuVTQznfMMKVoA1TqqDYjPIpjmOC7v9Wgs2jb2b2zTz6Zp59K6z3MysAwAuG0lzxhv0D9HeTwA</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3147528377</pqid></control><display><type>article</type><title>U-ActionNet: Dual-Pathway Fourier Networks With Region-of-Interest Module for Efficient Action Recognition in UAV Surveillance</title><source>IEEE Open Access Journals</source><source>DOAJ Directory of Open Access Journals</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><creator>Monaf Chowdhury, Abdul ; Imran, Ahsan ; Hasan, Md Mehedi ; Ahmed, Riad ; Azad, Akm ; Alyami, Salem A.</creator><creatorcontrib>Monaf Chowdhury, Abdul ; Imran, Ahsan ; Hasan, Md Mehedi ; Ahmed, Riad ; Azad, Akm ; Alyami, Salem A.</creatorcontrib><description>Unmanned Aerial Vehicles (UAV) have revolutionized human action recognition by offering a bird's-eye perspective, thereby unlocking unprecedented potential for comprehensive support within surveillance systems. This paper presents synergistic strategies for enhancing Human Action Recognition (HAR) in UAV imagery. Leveraging a dual-path approach, we propose a novel framework U-ActionNet that integrates FFT-based substance separation and space-time self-attention techniques to improve the accuracy and efficiency of HAR tasks. Catering the server side the first pathway employs a modified C3D model with Fast Fourier Transform (FFT)-based object movement and attention detection mechanisms to effectively extract human actors from complex scenes and capture spatiotemporal dynamics in UAV footage. Moreover, a generalized Region-of-Interest (ROI) module is utilized to concentrate on optimal regions to enhance target recognition. Through extensive experiments on the Drone Action and UAV-human datasets, we demonstrate the effectiveness of our approach, achieving superior performance compared to the state-of-the-art methods with Top-1 accuracies of 94.94% and 95.05%, respectively. Meanwhile, the second pathway employs edge-optimized models, integrating ROI extraction-based frame sampling technique to eliminate static frames while preserving pivotal frames essential for model training. We introduce a lightweight model named U-ActionNet Light which is a combination of MobileNetV2, Fourier module, and BiLSTM models that demand only one-ninth of the parameters compared to the server-side model. Demonstrating its efficacy, this model attains Top-1 accuracies of 80.43% and 84.74% on the same datasets, surpassing the baseline by a significant margin. Experimental results show that the presented frameworks are promising for surveillance, search and rescue, and activity monitoring, where accurate and real-time human action recognition from UAV platforms is essential.</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2024.3516586</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Accuracy ; action recognition ; Attention ; Autonomous aerial vehicles ; Cameras ; Computational efficiency ; Computational modeling ; Datasets ; Deep learning ; Drones ; Effectiveness ; Fast Fourier transformations ; Fast Fourier transforms ; FFT ; Fourier transforms ; Human activity recognition ; Human motion ; Modules ; Motion perception ; optimization ; Real time ; Servers ; Surveillance ; Surveillance systems ; Target recognition ; Task complexity ; tracking ; Unmanned aerial vehicles ; video surveillance</subject><ispartof>IEEE access, 2024, Vol.12, p.189547-189563</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c244t-bb0a3a7505ee6d49c27032cc62d94bdd54b9bc910a2aa81b9a7a93099172bb263</cites><orcidid>0000-0002-5251-2214 ; 0009-0007-2035-7284 ; 0009-0004-8465-6142 ; 0000-0002-5400-7166 ; 0000-0002-5507-9399 ; 0000-0003-4326-4332</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10794748$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,776,780,860,2096,4010,27610,27900,27901,27902,54908</link.rule.ids></links><search><creatorcontrib>Monaf Chowdhury, Abdul</creatorcontrib><creatorcontrib>Imran, Ahsan</creatorcontrib><creatorcontrib>Hasan, Md Mehedi</creatorcontrib><creatorcontrib>Ahmed, Riad</creatorcontrib><creatorcontrib>Azad, Akm</creatorcontrib><creatorcontrib>Alyami, Salem A.</creatorcontrib><title>U-ActionNet: Dual-Pathway Fourier Networks With Region-of-Interest Module for Efficient Action Recognition in UAV Surveillance</title><title>IEEE access</title><addtitle>Access</addtitle><description>Unmanned Aerial Vehicles (UAV) have revolutionized human action recognition by offering a bird's-eye perspective, thereby unlocking unprecedented potential for comprehensive support within surveillance systems. This paper presents synergistic strategies for enhancing Human Action Recognition (HAR) in UAV imagery. Leveraging a dual-path approach, we propose a novel framework U-ActionNet that integrates FFT-based substance separation and space-time self-attention techniques to improve the accuracy and efficiency of HAR tasks. Catering the server side the first pathway employs a modified C3D model with Fast Fourier Transform (FFT)-based object movement and attention detection mechanisms to effectively extract human actors from complex scenes and capture spatiotemporal dynamics in UAV footage. Moreover, a generalized Region-of-Interest (ROI) module is utilized to concentrate on optimal regions to enhance target recognition. Through extensive experiments on the Drone Action and UAV-human datasets, we demonstrate the effectiveness of our approach, achieving superior performance compared to the state-of-the-art methods with Top-1 accuracies of 94.94% and 95.05%, respectively. Meanwhile, the second pathway employs edge-optimized models, integrating ROI extraction-based frame sampling technique to eliminate static frames while preserving pivotal frames essential for model training. We introduce a lightweight model named U-ActionNet Light which is a combination of MobileNetV2, Fourier module, and BiLSTM models that demand only one-ninth of the parameters compared to the server-side model. Demonstrating its efficacy, this model attains Top-1 accuracies of 80.43% and 84.74% on the same datasets, surpassing the baseline by a significant margin. Experimental results show that the presented frameworks are promising for surveillance, search and rescue, and activity monitoring, where accurate and real-time human action recognition from UAV platforms is essential.</description><subject>Accuracy</subject><subject>action recognition</subject><subject>Attention</subject><subject>Autonomous aerial vehicles</subject><subject>Cameras</subject><subject>Computational efficiency</subject><subject>Computational modeling</subject><subject>Datasets</subject><subject>Deep learning</subject><subject>Drones</subject><subject>Effectiveness</subject><subject>Fast Fourier transformations</subject><subject>Fast Fourier transforms</subject><subject>FFT</subject><subject>Fourier transforms</subject><subject>Human activity recognition</subject><subject>Human motion</subject><subject>Modules</subject><subject>Motion perception</subject><subject>optimization</subject><subject>Real time</subject><subject>Servers</subject><subject>Surveillance</subject><subject>Surveillance systems</subject><subject>Target recognition</subject><subject>Task complexity</subject><subject>tracking</subject><subject>Unmanned aerial vehicles</subject><subject>video surveillance</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><sourceid>DOA</sourceid><recordid>eNpNUU1P3DAQjapWKgJ-QXuw1HMWfzvubbVdykr0QyzbHi3bmSzepjF1HBCX_vYagirmMqOZ996M5lXVO4IXhGB9tlyt1tvtgmLKF0wQKRr5qjqiROqaCSZfv6jfVqfjeMAlmtIS6qj6u6uXPoc4fIX8EX2abF9_t_nm3j6g8zilAAmVyX1Mv0b0M-QbdAX7gq5jV2-GDAnGjL7EduoBdTGhddcFH2DIaFYtcB_3Q3iqw4B2yx9oO6U7CH1vBw8n1ZvO9iOcPufjane-vl5d1JffPm9Wy8vaU85z7Ry2zCqBBYBsufZUYUa9l7TV3LWt4E47rwm21NqGOG2V1QxrTRR1jkp2XG1m3Tbag7lN4bdNDybaYJ4aMe2NTTn4HgzHgmqtOWcac9mxhgHHXlNw3smW4aL1Yda6TfHPVB5gDuVTQznfMMKVoA1TqqDYjPIpjmOC7v9Wgs2jb2b2zTz6Zp59K6z3MysAwAuG0lzxhv0D9HeTwA</recordid><startdate>2024</startdate><enddate>2024</enddate><creator>Monaf Chowdhury, Abdul</creator><creator>Imran, Ahsan</creator><creator>Hasan, Md Mehedi</creator><creator>Ahmed, Riad</creator><creator>Azad, Akm</creator><creator>Alyami, Salem A.</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0002-5251-2214</orcidid><orcidid>https://orcid.org/0009-0007-2035-7284</orcidid><orcidid>https://orcid.org/0009-0004-8465-6142</orcidid><orcidid>https://orcid.org/0000-0002-5400-7166</orcidid><orcidid>https://orcid.org/0000-0002-5507-9399</orcidid><orcidid>https://orcid.org/0000-0003-4326-4332</orcidid></search><sort><creationdate>2024</creationdate><title>U-ActionNet: Dual-Pathway Fourier Networks With Region-of-Interest Module for Efficient Action Recognition in UAV Surveillance</title><author>Monaf Chowdhury, Abdul ; Imran, Ahsan ; Hasan, Md Mehedi ; Ahmed, Riad ; Azad, Akm ; Alyami, Salem A.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c244t-bb0a3a7505ee6d49c27032cc62d94bdd54b9bc910a2aa81b9a7a93099172bb263</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Accuracy</topic><topic>action recognition</topic><topic>Attention</topic><topic>Autonomous aerial vehicles</topic><topic>Cameras</topic><topic>Computational efficiency</topic><topic>Computational modeling</topic><topic>Datasets</topic><topic>Deep learning</topic><topic>Drones</topic><topic>Effectiveness</topic><topic>Fast Fourier transformations</topic><topic>Fast Fourier transforms</topic><topic>FFT</topic><topic>Fourier transforms</topic><topic>Human activity recognition</topic><topic>Human motion</topic><topic>Modules</topic><topic>Motion perception</topic><topic>optimization</topic><topic>Real time</topic><topic>Servers</topic><topic>Surveillance</topic><topic>Surveillance systems</topic><topic>Target recognition</topic><topic>Task complexity</topic><topic>tracking</topic><topic>Unmanned aerial vehicles</topic><topic>video surveillance</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Monaf Chowdhury, Abdul</creatorcontrib><creatorcontrib>Imran, Ahsan</creatorcontrib><creatorcontrib>Hasan, Md Mehedi</creatorcontrib><creatorcontrib>Ahmed, Riad</creatorcontrib><creatorcontrib>Azad, Akm</creatorcontrib><creatorcontrib>Alyami, Salem A.</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Monaf Chowdhury, Abdul</au><au>Imran, Ahsan</au><au>Hasan, Md Mehedi</au><au>Ahmed, Riad</au><au>Azad, Akm</au><au>Alyami, Salem A.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>U-ActionNet: Dual-Pathway Fourier Networks With Region-of-Interest Module for Efficient Action Recognition in UAV Surveillance</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2024</date><risdate>2024</risdate><volume>12</volume><spage>189547</spage><epage>189563</epage><pages>189547-189563</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>Unmanned Aerial Vehicles (UAV) have revolutionized human action recognition by offering a bird's-eye perspective, thereby unlocking unprecedented potential for comprehensive support within surveillance systems. This paper presents synergistic strategies for enhancing Human Action Recognition (HAR) in UAV imagery. Leveraging a dual-path approach, we propose a novel framework U-ActionNet that integrates FFT-based substance separation and space-time self-attention techniques to improve the accuracy and efficiency of HAR tasks. Catering the server side the first pathway employs a modified C3D model with Fast Fourier Transform (FFT)-based object movement and attention detection mechanisms to effectively extract human actors from complex scenes and capture spatiotemporal dynamics in UAV footage. Moreover, a generalized Region-of-Interest (ROI) module is utilized to concentrate on optimal regions to enhance target recognition. Through extensive experiments on the Drone Action and UAV-human datasets, we demonstrate the effectiveness of our approach, achieving superior performance compared to the state-of-the-art methods with Top-1 accuracies of 94.94% and 95.05%, respectively. Meanwhile, the second pathway employs edge-optimized models, integrating ROI extraction-based frame sampling technique to eliminate static frames while preserving pivotal frames essential for model training. We introduce a lightweight model named U-ActionNet Light which is a combination of MobileNetV2, Fourier module, and BiLSTM models that demand only one-ninth of the parameters compared to the server-side model. Demonstrating its efficacy, this model attains Top-1 accuracies of 80.43% and 84.74% on the same datasets, surpassing the baseline by a significant margin. Experimental results show that the presented frameworks are promising for surveillance, search and rescue, and activity monitoring, where accurate and real-time human action recognition from UAV platforms is essential.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2024.3516586</doi><tpages>17</tpages><orcidid>https://orcid.org/0000-0002-5251-2214</orcidid><orcidid>https://orcid.org/0009-0007-2035-7284</orcidid><orcidid>https://orcid.org/0009-0004-8465-6142</orcidid><orcidid>https://orcid.org/0000-0002-5400-7166</orcidid><orcidid>https://orcid.org/0000-0002-5507-9399</orcidid><orcidid>https://orcid.org/0000-0003-4326-4332</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2169-3536
ispartof IEEE access, 2024, Vol.12, p.189547-189563
issn 2169-3536
2169-3536
language eng
recordid cdi_crossref_primary_10_1109_ACCESS_2024_3516586
source IEEE Open Access Journals; DOAJ Directory of Open Access Journals; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals
subjects Accuracy
action recognition
Attention
Autonomous aerial vehicles
Cameras
Computational efficiency
Computational modeling
Datasets
Deep learning
Drones
Effectiveness
Fast Fourier transformations
Fast Fourier transforms
FFT
Fourier transforms
Human activity recognition
Human motion
Modules
Motion perception
optimization
Real time
Servers
Surveillance
Surveillance systems
Target recognition
Task complexity
tracking
Unmanned aerial vehicles
video surveillance
title U-ActionNet: Dual-Pathway Fourier Networks With Region-of-Interest Module for Efficient Action Recognition in UAV Surveillance
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-09T06%3A35%3A46IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=U-ActionNet:%20Dual-Pathway%20Fourier%20Networks%20With%20Region-of-Interest%20Module%20for%20Efficient%20Action%20Recognition%20in%20UAV%20Surveillance&rft.jtitle=IEEE%20access&rft.au=Monaf%20Chowdhury,%20Abdul&rft.date=2024&rft.volume=12&rft.spage=189547&rft.epage=189563&rft.pages=189547-189563&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2024.3516586&rft_dat=%3Cproquest_cross%3E3147528377%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3147528377&rft_id=info:pmid/&rft_ieee_id=10794748&rft_doaj_id=oai_doaj_org_article_40529994439046f383e40c92ebcb6d30&rfr_iscdi=true