Attention U-Net based on multi-scale feature extraction and WSDAN data augmentation for video anomaly detection

The widespread adoption of video surveillance systems in public security and network security domains has underscored the importance of video anomaly detection as a pivotal research area. To enhance the precision and robustness of anomaly detection, this manuscript introduces an innovative method fo...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Multimedia systems 2024-06, Vol.30 (3), Article 118
Hauptverfasser:	Lei, Shanzhong, Song, Junfang, Wang, Tengjiao, Wang, Fangxin, Yan, Zhuyang
Format:	Artikel
Sprache:	eng
Schlagworte:	Anomalies Computer Communication Networks Computer Graphics Computer Science Cryptology Data augmentation Data Storage Representation Datasets Feature extraction Feature maps Image enhancement Machine learning Modules Multimedia Information Systems Noise prediction Operating Systems Regular Paper Security Signal to noise ratio Spatial data Surveillance systems Video data
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue	3
container_start_page
container_title	Multimedia systems
container_volume	30
creator	Lei, Shanzhong Song, Junfang Wang, Tengjiao Wang, Fangxin Yan, Zhuyang
description	The widespread adoption of video surveillance systems in public security and network security domains has underscored the importance of video anomaly detection as a pivotal research area. To enhance the precision and robustness of anomaly detection, this manuscript introduces an innovative method for video anomaly detection. The approach begins with the application of multi-scale feature extraction technology to capture visual information across varying scales in video data. Leveraging the Spatial Pyramid Convolution (SPC) module as the cornerstone for multi-scale feature learning, the study addresses the impact of scale variations, thereby augmenting the model’s detection capabilities across different scales. Furthermore, a Weakly Supervised Data Augmentation Network (WSDAN) module is incorporated to facilitate attention-guided data augmentation, enhancing the richness of input images. These augmented images undergo training with the U-Net network to elevate detection accuracy. Additionally, the integration of the improved Convolutional Block Attention Module (CBAM) into the base U-Net architecture enables end-to-end training. CBAM dynamically adjusts feature map weights, allowing the model to concentrate on anomaly relevant regions in the video while suppressing interference from non-anomalous areas. To assess anomalies, the paper employs the Peak Signal-to-Noise Ratio (PSNR) between predicted and original frames, normalizing PSNR values for anomaly identification. The proposed method is then evaluated using publicly available datasets CUHK Avenue and UCSD Ped2, with results visually presented. Experimental findings showcase Area Under the Receiver-Operating Characteristic Curve (AUC) values of 86.2% and 97.9% for these datasets, surpassing comparative methods and confirming the effectiveness and superiority of the proposed approach.
doi_str_mv	10.1007/s00530-024-01320-0
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_3034745690</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3034745690</sourcerecordid><originalsourceid>FETCH-LOGICAL-c319t-9f0a7c875a195469f24eb314ac75bb65a6fc933f14df9efd0ba499d9b95756d83</originalsourceid><addsrcrecordid>eNp9kMtKAzEUhoMoWKsv4CrgOnoyyVyyLPUKpS60uAxnJklpmUtNMmLf3mlHcOfq5JDv_w98hFxzuOUA-V0ASAUwSCQDLpLhdUImXIqE8aJITskElEyYVFlyTi5C2ALwPBMwId0sRtvGTdfSFVvaSEsM1tBhbfo6bliosLbUWYy9t9R-R4_VkcbW0I-3-9mSGoxIsV83Qw8e_1zn6dfG2G6gugbrPTU22mPukpw5rIO9-p1Tsnp8eJ8_s8Xr08t8tmCV4Coy5QDzqshT5CqVmXKJtKXgEqs8LcssxcxVSgjHpXHKOgMlSqWMKlWap5kpxJTcjL073332NkS97XrfDie1ACFzmWYKBioZqcp3IXjr9M5vGvR7zUEfxOpRrB7E6qNYfQiJMRQGuF1b_1f9T-oHesJ8BA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3034745690</pqid></control><display><type>article</type><title>Attention U-Net based on multi-scale feature extraction and WSDAN data augmentation for video anomaly detection</title><source>SpringerNature Journals</source><creator>Lei, Shanzhong ; Song, Junfang ; Wang, Tengjiao ; Wang, Fangxin ; Yan, Zhuyang</creator><creatorcontrib>Lei, Shanzhong ; Song, Junfang ; Wang, Tengjiao ; Wang, Fangxin ; Yan, Zhuyang</creatorcontrib><description>The widespread adoption of video surveillance systems in public security and network security domains has underscored the importance of video anomaly detection as a pivotal research area. To enhance the precision and robustness of anomaly detection, this manuscript introduces an innovative method for video anomaly detection. The approach begins with the application of multi-scale feature extraction technology to capture visual information across varying scales in video data. Leveraging the Spatial Pyramid Convolution (SPC) module as the cornerstone for multi-scale feature learning, the study addresses the impact of scale variations, thereby augmenting the model’s detection capabilities across different scales. Furthermore, a Weakly Supervised Data Augmentation Network (WSDAN) module is incorporated to facilitate attention-guided data augmentation, enhancing the richness of input images. These augmented images undergo training with the U-Net network to elevate detection accuracy. Additionally, the integration of the improved Convolutional Block Attention Module (CBAM) into the base U-Net architecture enables end-to-end training. CBAM dynamically adjusts feature map weights, allowing the model to concentrate on anomaly relevant regions in the video while suppressing interference from non-anomalous areas. To assess anomalies, the paper employs the Peak Signal-to-Noise Ratio (PSNR) between predicted and original frames, normalizing PSNR values for anomaly identification. The proposed method is then evaluated using publicly available datasets CUHK Avenue and UCSD Ped2, with results visually presented. Experimental findings showcase Area Under the Receiver-Operating Characteristic Curve (AUC) values of 86.2% and 97.9% for these datasets, surpassing comparative methods and confirming the effectiveness and superiority of the proposed approach.</description><identifier>ISSN: 0942-4962</identifier><identifier>EISSN: 1432-1882</identifier><identifier>DOI: 10.1007/s00530-024-01320-0</identifier><language>eng</language><publisher>Berlin/Heidelberg: Springer Berlin Heidelberg</publisher><subject>Anomalies ; Computer Communication Networks ; Computer Graphics ; Computer Science ; Cryptology ; Data augmentation ; Data Storage Representation ; Datasets ; Feature extraction ; Feature maps ; Image enhancement ; Machine learning ; Modules ; Multimedia Information Systems ; Noise prediction ; Operating Systems ; Regular Paper ; Security ; Signal to noise ratio ; Spatial data ; Surveillance systems ; Video data</subject><ispartof>Multimedia systems, 2024-06, Vol.30 (3), Article 118</ispartof><rights>The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2024. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c319t-9f0a7c875a195469f24eb314ac75bb65a6fc933f14df9efd0ba499d9b95756d83</citedby><cites>FETCH-LOGICAL-c319t-9f0a7c875a195469f24eb314ac75bb65a6fc933f14df9efd0ba499d9b95756d83</cites><orcidid>0009-0006-6948-6506</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s00530-024-01320-0$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s00530-024-01320-0$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,780,784,27924,27925,41488,42557,51319</link.rule.ids></links><search><creatorcontrib>Lei, Shanzhong</creatorcontrib><creatorcontrib>Song, Junfang</creatorcontrib><creatorcontrib>Wang, Tengjiao</creatorcontrib><creatorcontrib>Wang, Fangxin</creatorcontrib><creatorcontrib>Yan, Zhuyang</creatorcontrib><title>Attention U-Net based on multi-scale feature extraction and WSDAN data augmentation for video anomaly detection</title><title>Multimedia systems</title><addtitle>Multimedia Systems</addtitle><description>The widespread adoption of video surveillance systems in public security and network security domains has underscored the importance of video anomaly detection as a pivotal research area. To enhance the precision and robustness of anomaly detection, this manuscript introduces an innovative method for video anomaly detection. The approach begins with the application of multi-scale feature extraction technology to capture visual information across varying scales in video data. Leveraging the Spatial Pyramid Convolution (SPC) module as the cornerstone for multi-scale feature learning, the study addresses the impact of scale variations, thereby augmenting the model’s detection capabilities across different scales. Furthermore, a Weakly Supervised Data Augmentation Network (WSDAN) module is incorporated to facilitate attention-guided data augmentation, enhancing the richness of input images. These augmented images undergo training with the U-Net network to elevate detection accuracy. Additionally, the integration of the improved Convolutional Block Attention Module (CBAM) into the base U-Net architecture enables end-to-end training. CBAM dynamically adjusts feature map weights, allowing the model to concentrate on anomaly relevant regions in the video while suppressing interference from non-anomalous areas. To assess anomalies, the paper employs the Peak Signal-to-Noise Ratio (PSNR) between predicted and original frames, normalizing PSNR values for anomaly identification. The proposed method is then evaluated using publicly available datasets CUHK Avenue and UCSD Ped2, with results visually presented. Experimental findings showcase Area Under the Receiver-Operating Characteristic Curve (AUC) values of 86.2% and 97.9% for these datasets, surpassing comparative methods and confirming the effectiveness and superiority of the proposed approach.</description><subject>Anomalies</subject><subject>Computer Communication Networks</subject><subject>Computer Graphics</subject><subject>Computer Science</subject><subject>Cryptology</subject><subject>Data augmentation</subject><subject>Data Storage Representation</subject><subject>Datasets</subject><subject>Feature extraction</subject><subject>Feature maps</subject><subject>Image enhancement</subject><subject>Machine learning</subject><subject>Modules</subject><subject>Multimedia Information Systems</subject><subject>Noise prediction</subject><subject>Operating Systems</subject><subject>Regular Paper</subject><subject>Security</subject><subject>Signal to noise ratio</subject><subject>Spatial data</subject><subject>Surveillance systems</subject><subject>Video data</subject><issn>0942-4962</issn><issn>1432-1882</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNp9kMtKAzEUhoMoWKsv4CrgOnoyyVyyLPUKpS60uAxnJklpmUtNMmLf3mlHcOfq5JDv_w98hFxzuOUA-V0ASAUwSCQDLpLhdUImXIqE8aJITskElEyYVFlyTi5C2ALwPBMwId0sRtvGTdfSFVvaSEsM1tBhbfo6bliosLbUWYy9t9R-R4_VkcbW0I-3-9mSGoxIsV83Qw8e_1zn6dfG2G6gugbrPTU22mPukpw5rIO9-p1Tsnp8eJ8_s8Xr08t8tmCV4Coy5QDzqshT5CqVmXKJtKXgEqs8LcssxcxVSgjHpXHKOgMlSqWMKlWap5kpxJTcjL073332NkS97XrfDie1ACFzmWYKBioZqcp3IXjr9M5vGvR7zUEfxOpRrB7E6qNYfQiJMRQGuF1b_1f9T-oHesJ8BA</recordid><startdate>20240601</startdate><enddate>20240601</enddate><creator>Lei, Shanzhong</creator><creator>Song, Junfang</creator><creator>Wang, Tengjiao</creator><creator>Wang, Fangxin</creator><creator>Yan, Zhuyang</creator><general>Springer Berlin Heidelberg</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0009-0006-6948-6506</orcidid></search><sort><creationdate>20240601</creationdate><title>Attention U-Net based on multi-scale feature extraction and WSDAN data augmentation for video anomaly detection</title><author>Lei, Shanzhong ; Song, Junfang ; Wang, Tengjiao ; Wang, Fangxin ; Yan, Zhuyang</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c319t-9f0a7c875a195469f24eb314ac75bb65a6fc933f14df9efd0ba499d9b95756d83</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Anomalies</topic><topic>Computer Communication Networks</topic><topic>Computer Graphics</topic><topic>Computer Science</topic><topic>Cryptology</topic><topic>Data augmentation</topic><topic>Data Storage Representation</topic><topic>Datasets</topic><topic>Feature extraction</topic><topic>Feature maps</topic><topic>Image enhancement</topic><topic>Machine learning</topic><topic>Modules</topic><topic>Multimedia Information Systems</topic><topic>Noise prediction</topic><topic>Operating Systems</topic><topic>Regular Paper</topic><topic>Security</topic><topic>Signal to noise ratio</topic><topic>Spatial data</topic><topic>Surveillance systems</topic><topic>Video data</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Lei, Shanzhong</creatorcontrib><creatorcontrib>Song, Junfang</creatorcontrib><creatorcontrib>Wang, Tengjiao</creatorcontrib><creatorcontrib>Wang, Fangxin</creatorcontrib><creatorcontrib>Yan, Zhuyang</creatorcontrib><collection>CrossRef</collection><jtitle>Multimedia systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Lei, Shanzhong</au><au>Song, Junfang</au><au>Wang, Tengjiao</au><au>Wang, Fangxin</au><au>Yan, Zhuyang</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Attention U-Net based on multi-scale feature extraction and WSDAN data augmentation for video anomaly detection</atitle><jtitle>Multimedia systems</jtitle><stitle>Multimedia Systems</stitle><date>2024-06-01</date><risdate>2024</risdate><volume>30</volume><issue>3</issue><artnum>118</artnum><issn>0942-4962</issn><eissn>1432-1882</eissn><abstract>The widespread adoption of video surveillance systems in public security and network security domains has underscored the importance of video anomaly detection as a pivotal research area. To enhance the precision and robustness of anomaly detection, this manuscript introduces an innovative method for video anomaly detection. The approach begins with the application of multi-scale feature extraction technology to capture visual information across varying scales in video data. Leveraging the Spatial Pyramid Convolution (SPC) module as the cornerstone for multi-scale feature learning, the study addresses the impact of scale variations, thereby augmenting the model’s detection capabilities across different scales. Furthermore, a Weakly Supervised Data Augmentation Network (WSDAN) module is incorporated to facilitate attention-guided data augmentation, enhancing the richness of input images. These augmented images undergo training with the U-Net network to elevate detection accuracy. Additionally, the integration of the improved Convolutional Block Attention Module (CBAM) into the base U-Net architecture enables end-to-end training. CBAM dynamically adjusts feature map weights, allowing the model to concentrate on anomaly relevant regions in the video while suppressing interference from non-anomalous areas. To assess anomalies, the paper employs the Peak Signal-to-Noise Ratio (PSNR) between predicted and original frames, normalizing PSNR values for anomaly identification. The proposed method is then evaluated using publicly available datasets CUHK Avenue and UCSD Ped2, with results visually presented. Experimental findings showcase Area Under the Receiver-Operating Characteristic Curve (AUC) values of 86.2% and 97.9% for these datasets, surpassing comparative methods and confirming the effectiveness and superiority of the proposed approach.</abstract><cop>Berlin/Heidelberg</cop><pub>Springer Berlin Heidelberg</pub><doi>10.1007/s00530-024-01320-0</doi><orcidid>https://orcid.org/0009-0006-6948-6506</orcidid></addata></record>
fulltext	fulltext
identifier	ISSN: 0942-4962
ispartof	Multimedia systems, 2024-06, Vol.30 (3), Article 118
issn	0942-4962 1432-1882
language	eng
recordid	cdi_proquest_journals_3034745690
source	SpringerNature Journals
subjects	Anomalies Computer Communication Networks Computer Graphics Computer Science Cryptology Data augmentation Data Storage Representation Datasets Feature extraction Feature maps Image enhancement Machine learning Modules Multimedia Information Systems Noise prediction Operating Systems Regular Paper Security Signal to noise ratio Spatial data Surveillance systems Video data
title	Attention U-Net based on multi-scale feature extraction and WSDAN data augmentation for video anomaly detection
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-02T00%3A22%3A58IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Attention%20U-Net%20based%20on%20multi-scale%20feature%20extraction%20and%20WSDAN%20data%20augmentation%20for%20video%20anomaly%20detection&rft.jtitle=Multimedia%20systems&rft.au=Lei,%20Shanzhong&rft.date=2024-06-01&rft.volume=30&rft.issue=3&rft.artnum=118&rft.issn=0942-4962&rft.eissn=1432-1882&rft_id=info:doi/10.1007/s00530-024-01320-0&rft_dat=%3Cproquest_cross%3E3034745690%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3034745690&rft_id=info:pmid/&rfr_iscdi=true