U-ActionNet: Dual-Pathway Fourier Networks With Region-of-Interest Module for Efficient Action Recognition in UAV Surveillance
Unmanned Aerial Vehicles (UAV) have revolutionized human action recognition by offering a bird's-eye perspective, thereby unlocking unprecedented potential for comprehensive support within surveillance systems. This paper presents synergistic strategies for enhancing Human Action Recognition (H...
Gespeichert in:
Veröffentlicht in: | IEEE access 2024, Vol.12, p.189547-189563 |
---|---|
Hauptverfasser: | , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 189563 |
---|---|
container_issue | |
container_start_page | 189547 |
container_title | IEEE access |
container_volume | 12 |
creator | Monaf Chowdhury, Abdul Imran, Ahsan Hasan, Md Mehedi Ahmed, Riad Azad, Akm Alyami, Salem A. |
description | Unmanned Aerial Vehicles (UAV) have revolutionized human action recognition by offering a bird's-eye perspective, thereby unlocking unprecedented potential for comprehensive support within surveillance systems. This paper presents synergistic strategies for enhancing Human Action Recognition (HAR) in UAV imagery. Leveraging a dual-path approach, we propose a novel framework U-ActionNet that integrates FFT-based substance separation and space-time self-attention techniques to improve the accuracy and efficiency of HAR tasks. Catering the server side the first pathway employs a modified C3D model with Fast Fourier Transform (FFT)-based object movement and attention detection mechanisms to effectively extract human actors from complex scenes and capture spatiotemporal dynamics in UAV footage. Moreover, a generalized Region-of-Interest (ROI) module is utilized to concentrate on optimal regions to enhance target recognition. Through extensive experiments on the Drone Action and UAV-human datasets, we demonstrate the effectiveness of our approach, achieving superior performance compared to the state-of-the-art methods with Top-1 accuracies of 94.94% and 95.05%, respectively. Meanwhile, the second pathway employs edge-optimized models, integrating ROI extraction-based frame sampling technique to eliminate static frames while preserving pivotal frames essential for model training. We introduce a lightweight model named U-ActionNet Light which is a combination of MobileNetV2, Fourier module, and BiLSTM models that demand only one-ninth of the parameters compared to the server-side model. Demonstrating its efficacy, this model attains Top-1 accuracies of 80.43% and 84.74% on the same datasets, surpassing the baseline by a significant margin. Experimental results show that the presented frameworks are promising for surveillance, search and rescue, and activity monitoring, where accurate and real-time human action recognition from UAV platforms is essential. |
doi_str_mv | 10.1109/ACCESS.2024.3516586 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1109_ACCESS_2024_3516586</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10794748</ieee_id><doaj_id>oai_doaj_org_article_40529994439046f383e40c92ebcb6d30</doaj_id><sourcerecordid>3147528377</sourcerecordid><originalsourceid>FETCH-LOGICAL-c244t-bb0a3a7505ee6d49c27032cc62d94bdd54b9bc910a2aa81b9a7a93099172bb263</originalsourceid><addsrcrecordid>eNpNUU1P3DAQjapWKgJ-QXuw1HMWfzvubbVdykr0QyzbHi3bmSzepjF1HBCX_vYagirmMqOZ996M5lXVO4IXhGB9tlyt1tvtgmLKF0wQKRr5qjqiROqaCSZfv6jfVqfjeMAlmtIS6qj6u6uXPoc4fIX8EX2abF9_t_nm3j6g8zilAAmVyX1Mv0b0M-QbdAX7gq5jV2-GDAnGjL7EduoBdTGhddcFH2DIaFYtcB_3Q3iqw4B2yx9oO6U7CH1vBw8n1ZvO9iOcPufjane-vl5d1JffPm9Wy8vaU85z7Ry2zCqBBYBsufZUYUa9l7TV3LWt4E47rwm21NqGOG2V1QxrTRR1jkp2XG1m3Tbag7lN4bdNDybaYJ4aMe2NTTn4HgzHgmqtOWcac9mxhgHHXlNw3smW4aL1Yda6TfHPVB5gDuVTQznfMMKVoA1TqqDYjPIpjmOC7v9Wgs2jb2b2zTz6Zp59K6z3MysAwAuG0lzxhv0D9HeTwA</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3147528377</pqid></control><display><type>article</type><title>U-ActionNet: Dual-Pathway Fourier Networks With Region-of-Interest Module for Efficient Action Recognition in UAV Surveillance</title><source>IEEE Open Access Journals</source><source>DOAJ Directory of Open Access Journals</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><creator>Monaf Chowdhury, Abdul ; Imran, Ahsan ; Hasan, Md Mehedi ; Ahmed, Riad ; Azad, Akm ; Alyami, Salem A.</creator><creatorcontrib>Monaf Chowdhury, Abdul ; Imran, Ahsan ; Hasan, Md Mehedi ; Ahmed, Riad ; Azad, Akm ; Alyami, Salem A.</creatorcontrib><description>Unmanned Aerial Vehicles (UAV) have revolutionized human action recognition by offering a bird's-eye perspective, thereby unlocking unprecedented potential for comprehensive support within surveillance systems. This paper presents synergistic strategies for enhancing Human Action Recognition (HAR) in UAV imagery. Leveraging a dual-path approach, we propose a novel framework U-ActionNet that integrates FFT-based substance separation and space-time self-attention techniques to improve the accuracy and efficiency of HAR tasks. Catering the server side the first pathway employs a modified C3D model with Fast Fourier Transform (FFT)-based object movement and attention detection mechanisms to effectively extract human actors from complex scenes and capture spatiotemporal dynamics in UAV footage. Moreover, a generalized Region-of-Interest (ROI) module is utilized to concentrate on optimal regions to enhance target recognition. Through extensive experiments on the Drone Action and UAV-human datasets, we demonstrate the effectiveness of our approach, achieving superior performance compared to the state-of-the-art methods with Top-1 accuracies of 94.94% and 95.05%, respectively. Meanwhile, the second pathway employs edge-optimized models, integrating ROI extraction-based frame sampling technique to eliminate static frames while preserving pivotal frames essential for model training. We introduce a lightweight model named U-ActionNet Light which is a combination of MobileNetV2, Fourier module, and BiLSTM models that demand only one-ninth of the parameters compared to the server-side model. Demonstrating its efficacy, this model attains Top-1 accuracies of 80.43% and 84.74% on the same datasets, surpassing the baseline by a significant margin. Experimental results show that the presented frameworks are promising for surveillance, search and rescue, and activity monitoring, where accurate and real-time human action recognition from UAV platforms is essential.</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2024.3516586</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Accuracy ; action recognition ; Attention ; Autonomous aerial vehicles ; Cameras ; Computational efficiency ; Computational modeling ; Datasets ; Deep learning ; Drones ; Effectiveness ; Fast Fourier transformations ; Fast Fourier transforms ; FFT ; Fourier transforms ; Human activity recognition ; Human motion ; Modules ; Motion perception ; optimization ; Real time ; Servers ; Surveillance ; Surveillance systems ; Target recognition ; Task complexity ; tracking ; Unmanned aerial vehicles ; video surveillance</subject><ispartof>IEEE access, 2024, Vol.12, p.189547-189563</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c244t-bb0a3a7505ee6d49c27032cc62d94bdd54b9bc910a2aa81b9a7a93099172bb263</cites><orcidid>0000-0002-5251-2214 ; 0009-0007-2035-7284 ; 0009-0004-8465-6142 ; 0000-0002-5400-7166 ; 0000-0002-5507-9399 ; 0000-0003-4326-4332</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10794748$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,776,780,860,2096,4010,27610,27900,27901,27902,54908</link.rule.ids></links><search><creatorcontrib>Monaf Chowdhury, Abdul</creatorcontrib><creatorcontrib>Imran, Ahsan</creatorcontrib><creatorcontrib>Hasan, Md Mehedi</creatorcontrib><creatorcontrib>Ahmed, Riad</creatorcontrib><creatorcontrib>Azad, Akm</creatorcontrib><creatorcontrib>Alyami, Salem A.</creatorcontrib><title>U-ActionNet: Dual-Pathway Fourier Networks With Region-of-Interest Module for Efficient Action Recognition in UAV Surveillance</title><title>IEEE access</title><addtitle>Access</addtitle><description>Unmanned Aerial Vehicles (UAV) have revolutionized human action recognition by offering a bird's-eye perspective, thereby unlocking unprecedented potential for comprehensive support within surveillance systems. This paper presents synergistic strategies for enhancing Human Action Recognition (HAR) in UAV imagery. Leveraging a dual-path approach, we propose a novel framework U-ActionNet that integrates FFT-based substance separation and space-time self-attention techniques to improve the accuracy and efficiency of HAR tasks. Catering the server side the first pathway employs a modified C3D model with Fast Fourier Transform (FFT)-based object movement and attention detection mechanisms to effectively extract human actors from complex scenes and capture spatiotemporal dynamics in UAV footage. Moreover, a generalized Region-of-Interest (ROI) module is utilized to concentrate on optimal regions to enhance target recognition. Through extensive experiments on the Drone Action and UAV-human datasets, we demonstrate the effectiveness of our approach, achieving superior performance compared to the state-of-the-art methods with Top-1 accuracies of 94.94% and 95.05%, respectively. Meanwhile, the second pathway employs edge-optimized models, integrating ROI extraction-based frame sampling technique to eliminate static frames while preserving pivotal frames essential for model training. We introduce a lightweight model named U-ActionNet Light which is a combination of MobileNetV2, Fourier module, and BiLSTM models that demand only one-ninth of the parameters compared to the server-side model. Demonstrating its efficacy, this model attains Top-1 accuracies of 80.43% and 84.74% on the same datasets, surpassing the baseline by a significant margin. Experimental results show that the presented frameworks are promising for surveillance, search and rescue, and activity monitoring, where accurate and real-time human action recognition from UAV platforms is essential.</description><subject>Accuracy</subject><subject>action recognition</subject><subject>Attention</subject><subject>Autonomous aerial vehicles</subject><subject>Cameras</subject><subject>Computational efficiency</subject><subject>Computational modeling</subject><subject>Datasets</subject><subject>Deep learning</subject><subject>Drones</subject><subject>Effectiveness</subject><subject>Fast Fourier transformations</subject><subject>Fast Fourier transforms</subject><subject>FFT</subject><subject>Fourier transforms</subject><subject>Human activity recognition</subject><subject>Human motion</subject><subject>Modules</subject><subject>Motion perception</subject><subject>optimization</subject><subject>Real time</subject><subject>Servers</subject><subject>Surveillance</subject><subject>Surveillance systems</subject><subject>Target recognition</subject><subject>Task complexity</subject><subject>tracking</subject><subject>Unmanned aerial vehicles</subject><subject>video surveillance</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><sourceid>DOA</sourceid><recordid>eNpNUU1P3DAQjapWKgJ-QXuw1HMWfzvubbVdykr0QyzbHi3bmSzepjF1HBCX_vYagirmMqOZ996M5lXVO4IXhGB9tlyt1tvtgmLKF0wQKRr5qjqiROqaCSZfv6jfVqfjeMAlmtIS6qj6u6uXPoc4fIX8EX2abF9_t_nm3j6g8zilAAmVyX1Mv0b0M-QbdAX7gq5jV2-GDAnGjL7EduoBdTGhddcFH2DIaFYtcB_3Q3iqw4B2yx9oO6U7CH1vBw8n1ZvO9iOcPufjane-vl5d1JffPm9Wy8vaU85z7Ry2zCqBBYBsufZUYUa9l7TV3LWt4E47rwm21NqGOG2V1QxrTRR1jkp2XG1m3Tbag7lN4bdNDybaYJ4aMe2NTTn4HgzHgmqtOWcac9mxhgHHXlNw3smW4aL1Yda6TfHPVB5gDuVTQznfMMKVoA1TqqDYjPIpjmOC7v9Wgs2jb2b2zTz6Zp59K6z3MysAwAuG0lzxhv0D9HeTwA</recordid><startdate>2024</startdate><enddate>2024</enddate><creator>Monaf Chowdhury, Abdul</creator><creator>Imran, Ahsan</creator><creator>Hasan, Md Mehedi</creator><creator>Ahmed, Riad</creator><creator>Azad, Akm</creator><creator>Alyami, Salem A.</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0002-5251-2214</orcidid><orcidid>https://orcid.org/0009-0007-2035-7284</orcidid><orcidid>https://orcid.org/0009-0004-8465-6142</orcidid><orcidid>https://orcid.org/0000-0002-5400-7166</orcidid><orcidid>https://orcid.org/0000-0002-5507-9399</orcidid><orcidid>https://orcid.org/0000-0003-4326-4332</orcidid></search><sort><creationdate>2024</creationdate><title>U-ActionNet: Dual-Pathway Fourier Networks With Region-of-Interest Module for Efficient Action Recognition in UAV Surveillance</title><author>Monaf Chowdhury, Abdul ; Imran, Ahsan ; Hasan, Md Mehedi ; Ahmed, Riad ; Azad, Akm ; Alyami, Salem A.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c244t-bb0a3a7505ee6d49c27032cc62d94bdd54b9bc910a2aa81b9a7a93099172bb263</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Accuracy</topic><topic>action recognition</topic><topic>Attention</topic><topic>Autonomous aerial vehicles</topic><topic>Cameras</topic><topic>Computational efficiency</topic><topic>Computational modeling</topic><topic>Datasets</topic><topic>Deep learning</topic><topic>Drones</topic><topic>Effectiveness</topic><topic>Fast Fourier transformations</topic><topic>Fast Fourier transforms</topic><topic>FFT</topic><topic>Fourier transforms</topic><topic>Human activity recognition</topic><topic>Human motion</topic><topic>Modules</topic><topic>Motion perception</topic><topic>optimization</topic><topic>Real time</topic><topic>Servers</topic><topic>Surveillance</topic><topic>Surveillance systems</topic><topic>Target recognition</topic><topic>Task complexity</topic><topic>tracking</topic><topic>Unmanned aerial vehicles</topic><topic>video surveillance</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Monaf Chowdhury, Abdul</creatorcontrib><creatorcontrib>Imran, Ahsan</creatorcontrib><creatorcontrib>Hasan, Md Mehedi</creatorcontrib><creatorcontrib>Ahmed, Riad</creatorcontrib><creatorcontrib>Azad, Akm</creatorcontrib><creatorcontrib>Alyami, Salem A.</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Monaf Chowdhury, Abdul</au><au>Imran, Ahsan</au><au>Hasan, Md Mehedi</au><au>Ahmed, Riad</au><au>Azad, Akm</au><au>Alyami, Salem A.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>U-ActionNet: Dual-Pathway Fourier Networks With Region-of-Interest Module for Efficient Action Recognition in UAV Surveillance</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2024</date><risdate>2024</risdate><volume>12</volume><spage>189547</spage><epage>189563</epage><pages>189547-189563</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>Unmanned Aerial Vehicles (UAV) have revolutionized human action recognition by offering a bird's-eye perspective, thereby unlocking unprecedented potential for comprehensive support within surveillance systems. This paper presents synergistic strategies for enhancing Human Action Recognition (HAR) in UAV imagery. Leveraging a dual-path approach, we propose a novel framework U-ActionNet that integrates FFT-based substance separation and space-time self-attention techniques to improve the accuracy and efficiency of HAR tasks. Catering the server side the first pathway employs a modified C3D model with Fast Fourier Transform (FFT)-based object movement and attention detection mechanisms to effectively extract human actors from complex scenes and capture spatiotemporal dynamics in UAV footage. Moreover, a generalized Region-of-Interest (ROI) module is utilized to concentrate on optimal regions to enhance target recognition. Through extensive experiments on the Drone Action and UAV-human datasets, we demonstrate the effectiveness of our approach, achieving superior performance compared to the state-of-the-art methods with Top-1 accuracies of 94.94% and 95.05%, respectively. Meanwhile, the second pathway employs edge-optimized models, integrating ROI extraction-based frame sampling technique to eliminate static frames while preserving pivotal frames essential for model training. We introduce a lightweight model named U-ActionNet Light which is a combination of MobileNetV2, Fourier module, and BiLSTM models that demand only one-ninth of the parameters compared to the server-side model. Demonstrating its efficacy, this model attains Top-1 accuracies of 80.43% and 84.74% on the same datasets, surpassing the baseline by a significant margin. Experimental results show that the presented frameworks are promising for surveillance, search and rescue, and activity monitoring, where accurate and real-time human action recognition from UAV platforms is essential.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2024.3516586</doi><tpages>17</tpages><orcidid>https://orcid.org/0000-0002-5251-2214</orcidid><orcidid>https://orcid.org/0009-0007-2035-7284</orcidid><orcidid>https://orcid.org/0009-0004-8465-6142</orcidid><orcidid>https://orcid.org/0000-0002-5400-7166</orcidid><orcidid>https://orcid.org/0000-0002-5507-9399</orcidid><orcidid>https://orcid.org/0000-0003-4326-4332</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 2169-3536 |
ispartof | IEEE access, 2024, Vol.12, p.189547-189563 |
issn | 2169-3536 2169-3536 |
language | eng |
recordid | cdi_crossref_primary_10_1109_ACCESS_2024_3516586 |
source | IEEE Open Access Journals; DOAJ Directory of Open Access Journals; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals |
subjects | Accuracy action recognition Attention Autonomous aerial vehicles Cameras Computational efficiency Computational modeling Datasets Deep learning Drones Effectiveness Fast Fourier transformations Fast Fourier transforms FFT Fourier transforms Human activity recognition Human motion Modules Motion perception optimization Real time Servers Surveillance Surveillance systems Target recognition Task complexity tracking Unmanned aerial vehicles video surveillance |
title | U-ActionNet: Dual-Pathway Fourier Networks With Region-of-Interest Module for Efficient Action Recognition in UAV Surveillance |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-09T06%3A35%3A46IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=U-ActionNet:%20Dual-Pathway%20Fourier%20Networks%20With%20Region-of-Interest%20Module%20for%20Efficient%20Action%20Recognition%20in%20UAV%20Surveillance&rft.jtitle=IEEE%20access&rft.au=Monaf%20Chowdhury,%20Abdul&rft.date=2024&rft.volume=12&rft.spage=189547&rft.epage=189563&rft.pages=189547-189563&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2024.3516586&rft_dat=%3Cproquest_cross%3E3147528377%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3147528377&rft_id=info:pmid/&rft_ieee_id=10794748&rft_doaj_id=oai_doaj_org_article_40529994439046f383e40c92ebcb6d30&rfr_iscdi=true |