Attention U-Net based on multi-scale feature extraction and WSDAN data augmentation for video anomaly detection
The widespread adoption of video surveillance systems in public security and network security domains has underscored the importance of video anomaly detection as a pivotal research area. To enhance the precision and robustness of anomaly detection, this manuscript introduces an innovative method fo...
Gespeichert in:
Veröffentlicht in: | Multimedia systems 2024-06, Vol.30 (3), Article 118 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | 3 |
container_start_page | |
container_title | Multimedia systems |
container_volume | 30 |
creator | Lei, Shanzhong Song, Junfang Wang, Tengjiao Wang, Fangxin Yan, Zhuyang |
description | The widespread adoption of video surveillance systems in public security and network security domains has underscored the importance of video anomaly detection as a pivotal research area. To enhance the precision and robustness of anomaly detection, this manuscript introduces an innovative method for video anomaly detection. The approach begins with the application of multi-scale feature extraction technology to capture visual information across varying scales in video data. Leveraging the Spatial Pyramid Convolution (SPC) module as the cornerstone for multi-scale feature learning, the study addresses the impact of scale variations, thereby augmenting the model’s detection capabilities across different scales. Furthermore, a Weakly Supervised Data Augmentation Network (WSDAN) module is incorporated to facilitate attention-guided data augmentation, enhancing the richness of input images. These augmented images undergo training with the U-Net network to elevate detection accuracy. Additionally, the integration of the improved Convolutional Block Attention Module (CBAM) into the base U-Net architecture enables end-to-end training. CBAM dynamically adjusts feature map weights, allowing the model to concentrate on anomaly relevant regions in the video while suppressing interference from non-anomalous areas. To assess anomalies, the paper employs the Peak Signal-to-Noise Ratio (PSNR) between predicted and original frames, normalizing PSNR values for anomaly identification. The proposed method is then evaluated using publicly available datasets CUHK Avenue and UCSD Ped2, with results visually presented. Experimental findings showcase Area Under the Receiver-Operating Characteristic Curve (AUC) values of 86.2% and 97.9% for these datasets, surpassing comparative methods and confirming the effectiveness and superiority of the proposed approach. |
doi_str_mv | 10.1007/s00530-024-01320-0 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_3034745690</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3034745690</sourcerecordid><originalsourceid>FETCH-LOGICAL-c319t-9f0a7c875a195469f24eb314ac75bb65a6fc933f14df9efd0ba499d9b95756d83</originalsourceid><addsrcrecordid>eNp9kMtKAzEUhoMoWKsv4CrgOnoyyVyyLPUKpS60uAxnJklpmUtNMmLf3mlHcOfq5JDv_w98hFxzuOUA-V0ASAUwSCQDLpLhdUImXIqE8aJITskElEyYVFlyTi5C2ALwPBMwId0sRtvGTdfSFVvaSEsM1tBhbfo6bliosLbUWYy9t9R-R4_VkcbW0I-3-9mSGoxIsV83Qw8e_1zn6dfG2G6gugbrPTU22mPukpw5rIO9-p1Tsnp8eJ8_s8Xr08t8tmCV4Coy5QDzqshT5CqVmXKJtKXgEqs8LcssxcxVSgjHpXHKOgMlSqWMKlWap5kpxJTcjL073332NkS97XrfDie1ACFzmWYKBioZqcp3IXjr9M5vGvR7zUEfxOpRrB7E6qNYfQiJMRQGuF1b_1f9T-oHesJ8BA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3034745690</pqid></control><display><type>article</type><title>Attention U-Net based on multi-scale feature extraction and WSDAN data augmentation for video anomaly detection</title><source>SpringerNature Journals</source><creator>Lei, Shanzhong ; Song, Junfang ; Wang, Tengjiao ; Wang, Fangxin ; Yan, Zhuyang</creator><creatorcontrib>Lei, Shanzhong ; Song, Junfang ; Wang, Tengjiao ; Wang, Fangxin ; Yan, Zhuyang</creatorcontrib><description>The widespread adoption of video surveillance systems in public security and network security domains has underscored the importance of video anomaly detection as a pivotal research area. To enhance the precision and robustness of anomaly detection, this manuscript introduces an innovative method for video anomaly detection. The approach begins with the application of multi-scale feature extraction technology to capture visual information across varying scales in video data. Leveraging the Spatial Pyramid Convolution (SPC) module as the cornerstone for multi-scale feature learning, the study addresses the impact of scale variations, thereby augmenting the model’s detection capabilities across different scales. Furthermore, a Weakly Supervised Data Augmentation Network (WSDAN) module is incorporated to facilitate attention-guided data augmentation, enhancing the richness of input images. These augmented images undergo training with the U-Net network to elevate detection accuracy. Additionally, the integration of the improved Convolutional Block Attention Module (CBAM) into the base U-Net architecture enables end-to-end training. CBAM dynamically adjusts feature map weights, allowing the model to concentrate on anomaly relevant regions in the video while suppressing interference from non-anomalous areas. To assess anomalies, the paper employs the Peak Signal-to-Noise Ratio (PSNR) between predicted and original frames, normalizing PSNR values for anomaly identification. The proposed method is then evaluated using publicly available datasets CUHK Avenue and UCSD Ped2, with results visually presented. Experimental findings showcase Area Under the Receiver-Operating Characteristic Curve (AUC) values of 86.2% and 97.9% for these datasets, surpassing comparative methods and confirming the effectiveness and superiority of the proposed approach.</description><identifier>ISSN: 0942-4962</identifier><identifier>EISSN: 1432-1882</identifier><identifier>DOI: 10.1007/s00530-024-01320-0</identifier><language>eng</language><publisher>Berlin/Heidelberg: Springer Berlin Heidelberg</publisher><subject>Anomalies ; Computer Communication Networks ; Computer Graphics ; Computer Science ; Cryptology ; Data augmentation ; Data Storage Representation ; Datasets ; Feature extraction ; Feature maps ; Image enhancement ; Machine learning ; Modules ; Multimedia Information Systems ; Noise prediction ; Operating Systems ; Regular Paper ; Security ; Signal to noise ratio ; Spatial data ; Surveillance systems ; Video data</subject><ispartof>Multimedia systems, 2024-06, Vol.30 (3), Article 118</ispartof><rights>The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2024. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c319t-9f0a7c875a195469f24eb314ac75bb65a6fc933f14df9efd0ba499d9b95756d83</citedby><cites>FETCH-LOGICAL-c319t-9f0a7c875a195469f24eb314ac75bb65a6fc933f14df9efd0ba499d9b95756d83</cites><orcidid>0009-0006-6948-6506</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s00530-024-01320-0$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s00530-024-01320-0$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,780,784,27924,27925,41488,42557,51319</link.rule.ids></links><search><creatorcontrib>Lei, Shanzhong</creatorcontrib><creatorcontrib>Song, Junfang</creatorcontrib><creatorcontrib>Wang, Tengjiao</creatorcontrib><creatorcontrib>Wang, Fangxin</creatorcontrib><creatorcontrib>Yan, Zhuyang</creatorcontrib><title>Attention U-Net based on multi-scale feature extraction and WSDAN data augmentation for video anomaly detection</title><title>Multimedia systems</title><addtitle>Multimedia Systems</addtitle><description>The widespread adoption of video surveillance systems in public security and network security domains has underscored the importance of video anomaly detection as a pivotal research area. To enhance the precision and robustness of anomaly detection, this manuscript introduces an innovative method for video anomaly detection. The approach begins with the application of multi-scale feature extraction technology to capture visual information across varying scales in video data. Leveraging the Spatial Pyramid Convolution (SPC) module as the cornerstone for multi-scale feature learning, the study addresses the impact of scale variations, thereby augmenting the model’s detection capabilities across different scales. Furthermore, a Weakly Supervised Data Augmentation Network (WSDAN) module is incorporated to facilitate attention-guided data augmentation, enhancing the richness of input images. These augmented images undergo training with the U-Net network to elevate detection accuracy. Additionally, the integration of the improved Convolutional Block Attention Module (CBAM) into the base U-Net architecture enables end-to-end training. CBAM dynamically adjusts feature map weights, allowing the model to concentrate on anomaly relevant regions in the video while suppressing interference from non-anomalous areas. To assess anomalies, the paper employs the Peak Signal-to-Noise Ratio (PSNR) between predicted and original frames, normalizing PSNR values for anomaly identification. The proposed method is then evaluated using publicly available datasets CUHK Avenue and UCSD Ped2, with results visually presented. Experimental findings showcase Area Under the Receiver-Operating Characteristic Curve (AUC) values of 86.2% and 97.9% for these datasets, surpassing comparative methods and confirming the effectiveness and superiority of the proposed approach.</description><subject>Anomalies</subject><subject>Computer Communication Networks</subject><subject>Computer Graphics</subject><subject>Computer Science</subject><subject>Cryptology</subject><subject>Data augmentation</subject><subject>Data Storage Representation</subject><subject>Datasets</subject><subject>Feature extraction</subject><subject>Feature maps</subject><subject>Image enhancement</subject><subject>Machine learning</subject><subject>Modules</subject><subject>Multimedia Information Systems</subject><subject>Noise prediction</subject><subject>Operating Systems</subject><subject>Regular Paper</subject><subject>Security</subject><subject>Signal to noise ratio</subject><subject>Spatial data</subject><subject>Surveillance systems</subject><subject>Video data</subject><issn>0942-4962</issn><issn>1432-1882</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNp9kMtKAzEUhoMoWKsv4CrgOnoyyVyyLPUKpS60uAxnJklpmUtNMmLf3mlHcOfq5JDv_w98hFxzuOUA-V0ASAUwSCQDLpLhdUImXIqE8aJITskElEyYVFlyTi5C2ALwPBMwId0sRtvGTdfSFVvaSEsM1tBhbfo6bliosLbUWYy9t9R-R4_VkcbW0I-3-9mSGoxIsV83Qw8e_1zn6dfG2G6gugbrPTU22mPukpw5rIO9-p1Tsnp8eJ8_s8Xr08t8tmCV4Coy5QDzqshT5CqVmXKJtKXgEqs8LcssxcxVSgjHpXHKOgMlSqWMKlWap5kpxJTcjL073332NkS97XrfDie1ACFzmWYKBioZqcp3IXjr9M5vGvR7zUEfxOpRrB7E6qNYfQiJMRQGuF1b_1f9T-oHesJ8BA</recordid><startdate>20240601</startdate><enddate>20240601</enddate><creator>Lei, Shanzhong</creator><creator>Song, Junfang</creator><creator>Wang, Tengjiao</creator><creator>Wang, Fangxin</creator><creator>Yan, Zhuyang</creator><general>Springer Berlin Heidelberg</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0009-0006-6948-6506</orcidid></search><sort><creationdate>20240601</creationdate><title>Attention U-Net based on multi-scale feature extraction and WSDAN data augmentation for video anomaly detection</title><author>Lei, Shanzhong ; Song, Junfang ; Wang, Tengjiao ; Wang, Fangxin ; Yan, Zhuyang</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c319t-9f0a7c875a195469f24eb314ac75bb65a6fc933f14df9efd0ba499d9b95756d83</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Anomalies</topic><topic>Computer Communication Networks</topic><topic>Computer Graphics</topic><topic>Computer Science</topic><topic>Cryptology</topic><topic>Data augmentation</topic><topic>Data Storage Representation</topic><topic>Datasets</topic><topic>Feature extraction</topic><topic>Feature maps</topic><topic>Image enhancement</topic><topic>Machine learning</topic><topic>Modules</topic><topic>Multimedia Information Systems</topic><topic>Noise prediction</topic><topic>Operating Systems</topic><topic>Regular Paper</topic><topic>Security</topic><topic>Signal to noise ratio</topic><topic>Spatial data</topic><topic>Surveillance systems</topic><topic>Video data</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Lei, Shanzhong</creatorcontrib><creatorcontrib>Song, Junfang</creatorcontrib><creatorcontrib>Wang, Tengjiao</creatorcontrib><creatorcontrib>Wang, Fangxin</creatorcontrib><creatorcontrib>Yan, Zhuyang</creatorcontrib><collection>CrossRef</collection><jtitle>Multimedia systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Lei, Shanzhong</au><au>Song, Junfang</au><au>Wang, Tengjiao</au><au>Wang, Fangxin</au><au>Yan, Zhuyang</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Attention U-Net based on multi-scale feature extraction and WSDAN data augmentation for video anomaly detection</atitle><jtitle>Multimedia systems</jtitle><stitle>Multimedia Systems</stitle><date>2024-06-01</date><risdate>2024</risdate><volume>30</volume><issue>3</issue><artnum>118</artnum><issn>0942-4962</issn><eissn>1432-1882</eissn><abstract>The widespread adoption of video surveillance systems in public security and network security domains has underscored the importance of video anomaly detection as a pivotal research area. To enhance the precision and robustness of anomaly detection, this manuscript introduces an innovative method for video anomaly detection. The approach begins with the application of multi-scale feature extraction technology to capture visual information across varying scales in video data. Leveraging the Spatial Pyramid Convolution (SPC) module as the cornerstone for multi-scale feature learning, the study addresses the impact of scale variations, thereby augmenting the model’s detection capabilities across different scales. Furthermore, a Weakly Supervised Data Augmentation Network (WSDAN) module is incorporated to facilitate attention-guided data augmentation, enhancing the richness of input images. These augmented images undergo training with the U-Net network to elevate detection accuracy. Additionally, the integration of the improved Convolutional Block Attention Module (CBAM) into the base U-Net architecture enables end-to-end training. CBAM dynamically adjusts feature map weights, allowing the model to concentrate on anomaly relevant regions in the video while suppressing interference from non-anomalous areas. To assess anomalies, the paper employs the Peak Signal-to-Noise Ratio (PSNR) between predicted and original frames, normalizing PSNR values for anomaly identification. The proposed method is then evaluated using publicly available datasets CUHK Avenue and UCSD Ped2, with results visually presented. Experimental findings showcase Area Under the Receiver-Operating Characteristic Curve (AUC) values of 86.2% and 97.9% for these datasets, surpassing comparative methods and confirming the effectiveness and superiority of the proposed approach.</abstract><cop>Berlin/Heidelberg</cop><pub>Springer Berlin Heidelberg</pub><doi>10.1007/s00530-024-01320-0</doi><orcidid>https://orcid.org/0009-0006-6948-6506</orcidid></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0942-4962 |
ispartof | Multimedia systems, 2024-06, Vol.30 (3), Article 118 |
issn | 0942-4962 1432-1882 |
language | eng |
recordid | cdi_proquest_journals_3034745690 |
source | SpringerNature Journals |
subjects | Anomalies Computer Communication Networks Computer Graphics Computer Science Cryptology Data augmentation Data Storage Representation Datasets Feature extraction Feature maps Image enhancement Machine learning Modules Multimedia Information Systems Noise prediction Operating Systems Regular Paper Security Signal to noise ratio Spatial data Surveillance systems Video data |
title | Attention U-Net based on multi-scale feature extraction and WSDAN data augmentation for video anomaly detection |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-02T00%3A22%3A58IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Attention%20U-Net%20based%20on%20multi-scale%20feature%20extraction%20and%20WSDAN%20data%20augmentation%20for%20video%20anomaly%20detection&rft.jtitle=Multimedia%20systems&rft.au=Lei,%20Shanzhong&rft.date=2024-06-01&rft.volume=30&rft.issue=3&rft.artnum=118&rft.issn=0942-4962&rft.eissn=1432-1882&rft_id=info:doi/10.1007/s00530-024-01320-0&rft_dat=%3Cproquest_cross%3E3034745690%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3034745690&rft_id=info:pmid/&rfr_iscdi=true |