Attention U-Net based on multi-scale feature extraction and WSDAN data augmentation for video anomaly detection

The widespread adoption of video surveillance systems in public security and network security domains has underscored the importance of video anomaly detection as a pivotal research area. To enhance the precision and robustness of anomaly detection, this manuscript introduces an innovative method fo...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Multimedia systems 2024-06, Vol.30 (3), Article 118
Hauptverfasser: Lei, Shanzhong, Song, Junfang, Wang, Tengjiao, Wang, Fangxin, Yan, Zhuyang
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue 3
container_start_page
container_title Multimedia systems
container_volume 30
creator Lei, Shanzhong
Song, Junfang
Wang, Tengjiao
Wang, Fangxin
Yan, Zhuyang
description The widespread adoption of video surveillance systems in public security and network security domains has underscored the importance of video anomaly detection as a pivotal research area. To enhance the precision and robustness of anomaly detection, this manuscript introduces an innovative method for video anomaly detection. The approach begins with the application of multi-scale feature extraction technology to capture visual information across varying scales in video data. Leveraging the Spatial Pyramid Convolution (SPC) module as the cornerstone for multi-scale feature learning, the study addresses the impact of scale variations, thereby augmenting the model’s detection capabilities across different scales. Furthermore, a Weakly Supervised Data Augmentation Network (WSDAN) module is incorporated to facilitate attention-guided data augmentation, enhancing the richness of input images. These augmented images undergo training with the U-Net network to elevate detection accuracy. Additionally, the integration of the improved Convolutional Block Attention Module (CBAM) into the base U-Net architecture enables end-to-end training. CBAM dynamically adjusts feature map weights, allowing the model to concentrate on anomaly relevant regions in the video while suppressing interference from non-anomalous areas. To assess anomalies, the paper employs the Peak Signal-to-Noise Ratio (PSNR) between predicted and original frames, normalizing PSNR values for anomaly identification. The proposed method is then evaluated using publicly available datasets CUHK Avenue and UCSD Ped2, with results visually presented. Experimental findings showcase Area Under the Receiver-Operating Characteristic Curve (AUC) values of 86.2% and 97.9% for these datasets, surpassing comparative methods and confirming the effectiveness and superiority of the proposed approach.
doi_str_mv 10.1007/s00530-024-01320-0
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_3034745690</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3034745690</sourcerecordid><originalsourceid>FETCH-LOGICAL-c319t-9f0a7c875a195469f24eb314ac75bb65a6fc933f14df9efd0ba499d9b95756d83</originalsourceid><addsrcrecordid>eNp9kMtKAzEUhoMoWKsv4CrgOnoyyVyyLPUKpS60uAxnJklpmUtNMmLf3mlHcOfq5JDv_w98hFxzuOUA-V0ASAUwSCQDLpLhdUImXIqE8aJITskElEyYVFlyTi5C2ALwPBMwId0sRtvGTdfSFVvaSEsM1tBhbfo6bliosLbUWYy9t9R-R4_VkcbW0I-3-9mSGoxIsV83Qw8e_1zn6dfG2G6gugbrPTU22mPukpw5rIO9-p1Tsnp8eJ8_s8Xr08t8tmCV4Coy5QDzqshT5CqVmXKJtKXgEqs8LcssxcxVSgjHpXHKOgMlSqWMKlWap5kpxJTcjL073332NkS97XrfDie1ACFzmWYKBioZqcp3IXjr9M5vGvR7zUEfxOpRrB7E6qNYfQiJMRQGuF1b_1f9T-oHesJ8BA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3034745690</pqid></control><display><type>article</type><title>Attention U-Net based on multi-scale feature extraction and WSDAN data augmentation for video anomaly detection</title><source>SpringerNature Journals</source><creator>Lei, Shanzhong ; Song, Junfang ; Wang, Tengjiao ; Wang, Fangxin ; Yan, Zhuyang</creator><creatorcontrib>Lei, Shanzhong ; Song, Junfang ; Wang, Tengjiao ; Wang, Fangxin ; Yan, Zhuyang</creatorcontrib><description>The widespread adoption of video surveillance systems in public security and network security domains has underscored the importance of video anomaly detection as a pivotal research area. To enhance the precision and robustness of anomaly detection, this manuscript introduces an innovative method for video anomaly detection. The approach begins with the application of multi-scale feature extraction technology to capture visual information across varying scales in video data. Leveraging the Spatial Pyramid Convolution (SPC) module as the cornerstone for multi-scale feature learning, the study addresses the impact of scale variations, thereby augmenting the model’s detection capabilities across different scales. Furthermore, a Weakly Supervised Data Augmentation Network (WSDAN) module is incorporated to facilitate attention-guided data augmentation, enhancing the richness of input images. These augmented images undergo training with the U-Net network to elevate detection accuracy. Additionally, the integration of the improved Convolutional Block Attention Module (CBAM) into the base U-Net architecture enables end-to-end training. CBAM dynamically adjusts feature map weights, allowing the model to concentrate on anomaly relevant regions in the video while suppressing interference from non-anomalous areas. To assess anomalies, the paper employs the Peak Signal-to-Noise Ratio (PSNR) between predicted and original frames, normalizing PSNR values for anomaly identification. The proposed method is then evaluated using publicly available datasets CUHK Avenue and UCSD Ped2, with results visually presented. Experimental findings showcase Area Under the Receiver-Operating Characteristic Curve (AUC) values of 86.2% and 97.9% for these datasets, surpassing comparative methods and confirming the effectiveness and superiority of the proposed approach.</description><identifier>ISSN: 0942-4962</identifier><identifier>EISSN: 1432-1882</identifier><identifier>DOI: 10.1007/s00530-024-01320-0</identifier><language>eng</language><publisher>Berlin/Heidelberg: Springer Berlin Heidelberg</publisher><subject>Anomalies ; Computer Communication Networks ; Computer Graphics ; Computer Science ; Cryptology ; Data augmentation ; Data Storage Representation ; Datasets ; Feature extraction ; Feature maps ; Image enhancement ; Machine learning ; Modules ; Multimedia Information Systems ; Noise prediction ; Operating Systems ; Regular Paper ; Security ; Signal to noise ratio ; Spatial data ; Surveillance systems ; Video data</subject><ispartof>Multimedia systems, 2024-06, Vol.30 (3), Article 118</ispartof><rights>The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2024. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c319t-9f0a7c875a195469f24eb314ac75bb65a6fc933f14df9efd0ba499d9b95756d83</citedby><cites>FETCH-LOGICAL-c319t-9f0a7c875a195469f24eb314ac75bb65a6fc933f14df9efd0ba499d9b95756d83</cites><orcidid>0009-0006-6948-6506</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s00530-024-01320-0$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s00530-024-01320-0$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,780,784,27924,27925,41488,42557,51319</link.rule.ids></links><search><creatorcontrib>Lei, Shanzhong</creatorcontrib><creatorcontrib>Song, Junfang</creatorcontrib><creatorcontrib>Wang, Tengjiao</creatorcontrib><creatorcontrib>Wang, Fangxin</creatorcontrib><creatorcontrib>Yan, Zhuyang</creatorcontrib><title>Attention U-Net based on multi-scale feature extraction and WSDAN data augmentation for video anomaly detection</title><title>Multimedia systems</title><addtitle>Multimedia Systems</addtitle><description>The widespread adoption of video surveillance systems in public security and network security domains has underscored the importance of video anomaly detection as a pivotal research area. To enhance the precision and robustness of anomaly detection, this manuscript introduces an innovative method for video anomaly detection. The approach begins with the application of multi-scale feature extraction technology to capture visual information across varying scales in video data. Leveraging the Spatial Pyramid Convolution (SPC) module as the cornerstone for multi-scale feature learning, the study addresses the impact of scale variations, thereby augmenting the model’s detection capabilities across different scales. Furthermore, a Weakly Supervised Data Augmentation Network (WSDAN) module is incorporated to facilitate attention-guided data augmentation, enhancing the richness of input images. These augmented images undergo training with the U-Net network to elevate detection accuracy. Additionally, the integration of the improved Convolutional Block Attention Module (CBAM) into the base U-Net architecture enables end-to-end training. CBAM dynamically adjusts feature map weights, allowing the model to concentrate on anomaly relevant regions in the video while suppressing interference from non-anomalous areas. To assess anomalies, the paper employs the Peak Signal-to-Noise Ratio (PSNR) between predicted and original frames, normalizing PSNR values for anomaly identification. The proposed method is then evaluated using publicly available datasets CUHK Avenue and UCSD Ped2, with results visually presented. Experimental findings showcase Area Under the Receiver-Operating Characteristic Curve (AUC) values of 86.2% and 97.9% for these datasets, surpassing comparative methods and confirming the effectiveness and superiority of the proposed approach.</description><subject>Anomalies</subject><subject>Computer Communication Networks</subject><subject>Computer Graphics</subject><subject>Computer Science</subject><subject>Cryptology</subject><subject>Data augmentation</subject><subject>Data Storage Representation</subject><subject>Datasets</subject><subject>Feature extraction</subject><subject>Feature maps</subject><subject>Image enhancement</subject><subject>Machine learning</subject><subject>Modules</subject><subject>Multimedia Information Systems</subject><subject>Noise prediction</subject><subject>Operating Systems</subject><subject>Regular Paper</subject><subject>Security</subject><subject>Signal to noise ratio</subject><subject>Spatial data</subject><subject>Surveillance systems</subject><subject>Video data</subject><issn>0942-4962</issn><issn>1432-1882</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNp9kMtKAzEUhoMoWKsv4CrgOnoyyVyyLPUKpS60uAxnJklpmUtNMmLf3mlHcOfq5JDv_w98hFxzuOUA-V0ASAUwSCQDLpLhdUImXIqE8aJITskElEyYVFlyTi5C2ALwPBMwId0sRtvGTdfSFVvaSEsM1tBhbfo6bliosLbUWYy9t9R-R4_VkcbW0I-3-9mSGoxIsV83Qw8e_1zn6dfG2G6gugbrPTU22mPukpw5rIO9-p1Tsnp8eJ8_s8Xr08t8tmCV4Coy5QDzqshT5CqVmXKJtKXgEqs8LcssxcxVSgjHpXHKOgMlSqWMKlWap5kpxJTcjL073332NkS97XrfDie1ACFzmWYKBioZqcp3IXjr9M5vGvR7zUEfxOpRrB7E6qNYfQiJMRQGuF1b_1f9T-oHesJ8BA</recordid><startdate>20240601</startdate><enddate>20240601</enddate><creator>Lei, Shanzhong</creator><creator>Song, Junfang</creator><creator>Wang, Tengjiao</creator><creator>Wang, Fangxin</creator><creator>Yan, Zhuyang</creator><general>Springer Berlin Heidelberg</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0009-0006-6948-6506</orcidid></search><sort><creationdate>20240601</creationdate><title>Attention U-Net based on multi-scale feature extraction and WSDAN data augmentation for video anomaly detection</title><author>Lei, Shanzhong ; Song, Junfang ; Wang, Tengjiao ; Wang, Fangxin ; Yan, Zhuyang</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c319t-9f0a7c875a195469f24eb314ac75bb65a6fc933f14df9efd0ba499d9b95756d83</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Anomalies</topic><topic>Computer Communication Networks</topic><topic>Computer Graphics</topic><topic>Computer Science</topic><topic>Cryptology</topic><topic>Data augmentation</topic><topic>Data Storage Representation</topic><topic>Datasets</topic><topic>Feature extraction</topic><topic>Feature maps</topic><topic>Image enhancement</topic><topic>Machine learning</topic><topic>Modules</topic><topic>Multimedia Information Systems</topic><topic>Noise prediction</topic><topic>Operating Systems</topic><topic>Regular Paper</topic><topic>Security</topic><topic>Signal to noise ratio</topic><topic>Spatial data</topic><topic>Surveillance systems</topic><topic>Video data</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Lei, Shanzhong</creatorcontrib><creatorcontrib>Song, Junfang</creatorcontrib><creatorcontrib>Wang, Tengjiao</creatorcontrib><creatorcontrib>Wang, Fangxin</creatorcontrib><creatorcontrib>Yan, Zhuyang</creatorcontrib><collection>CrossRef</collection><jtitle>Multimedia systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Lei, Shanzhong</au><au>Song, Junfang</au><au>Wang, Tengjiao</au><au>Wang, Fangxin</au><au>Yan, Zhuyang</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Attention U-Net based on multi-scale feature extraction and WSDAN data augmentation for video anomaly detection</atitle><jtitle>Multimedia systems</jtitle><stitle>Multimedia Systems</stitle><date>2024-06-01</date><risdate>2024</risdate><volume>30</volume><issue>3</issue><artnum>118</artnum><issn>0942-4962</issn><eissn>1432-1882</eissn><abstract>The widespread adoption of video surveillance systems in public security and network security domains has underscored the importance of video anomaly detection as a pivotal research area. To enhance the precision and robustness of anomaly detection, this manuscript introduces an innovative method for video anomaly detection. The approach begins with the application of multi-scale feature extraction technology to capture visual information across varying scales in video data. Leveraging the Spatial Pyramid Convolution (SPC) module as the cornerstone for multi-scale feature learning, the study addresses the impact of scale variations, thereby augmenting the model’s detection capabilities across different scales. Furthermore, a Weakly Supervised Data Augmentation Network (WSDAN) module is incorporated to facilitate attention-guided data augmentation, enhancing the richness of input images. These augmented images undergo training with the U-Net network to elevate detection accuracy. Additionally, the integration of the improved Convolutional Block Attention Module (CBAM) into the base U-Net architecture enables end-to-end training. CBAM dynamically adjusts feature map weights, allowing the model to concentrate on anomaly relevant regions in the video while suppressing interference from non-anomalous areas. To assess anomalies, the paper employs the Peak Signal-to-Noise Ratio (PSNR) between predicted and original frames, normalizing PSNR values for anomaly identification. The proposed method is then evaluated using publicly available datasets CUHK Avenue and UCSD Ped2, with results visually presented. Experimental findings showcase Area Under the Receiver-Operating Characteristic Curve (AUC) values of 86.2% and 97.9% for these datasets, surpassing comparative methods and confirming the effectiveness and superiority of the proposed approach.</abstract><cop>Berlin/Heidelberg</cop><pub>Springer Berlin Heidelberg</pub><doi>10.1007/s00530-024-01320-0</doi><orcidid>https://orcid.org/0009-0006-6948-6506</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 0942-4962
ispartof Multimedia systems, 2024-06, Vol.30 (3), Article 118
issn 0942-4962
1432-1882
language eng
recordid cdi_proquest_journals_3034745690
source SpringerNature Journals
subjects Anomalies
Computer Communication Networks
Computer Graphics
Computer Science
Cryptology
Data augmentation
Data Storage Representation
Datasets
Feature extraction
Feature maps
Image enhancement
Machine learning
Modules
Multimedia Information Systems
Noise prediction
Operating Systems
Regular Paper
Security
Signal to noise ratio
Spatial data
Surveillance systems
Video data
title Attention U-Net based on multi-scale feature extraction and WSDAN data augmentation for video anomaly detection
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-02T00%3A22%3A58IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Attention%20U-Net%20based%20on%20multi-scale%20feature%20extraction%20and%20WSDAN%20data%20augmentation%20for%20video%20anomaly%20detection&rft.jtitle=Multimedia%20systems&rft.au=Lei,%20Shanzhong&rft.date=2024-06-01&rft.volume=30&rft.issue=3&rft.artnum=118&rft.issn=0942-4962&rft.eissn=1432-1882&rft_id=info:doi/10.1007/s00530-024-01320-0&rft_dat=%3Cproquest_cross%3E3034745690%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3034745690&rft_id=info:pmid/&rfr_iscdi=true