GlocalCLIP: Object-agnostic Global-Local Prompt Learning for Zero-shot Anomaly Detection

Zero-shot anomaly detection (ZSAD) is crucial for detecting anomalous patterns in target datasets without using training samples, specifically in scenarios where there are distributional differences between the target domain and training data or where data scarcity arises because of restricted acces...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:arXiv.org 2024-12
Hauptverfasser: Ham, Jiyul, Jung, Yonggon, Jun-Geol Baek
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Ham, Jiyul
Jung, Yonggon
Jun-Geol Baek
description Zero-shot anomaly detection (ZSAD) is crucial for detecting anomalous patterns in target datasets without using training samples, specifically in scenarios where there are distributional differences between the target domain and training data or where data scarcity arises because of restricted access. Although recently pretrained vision-language models demonstrate strong zero-shot performance across various visual tasks, they focus on learning class semantics, which makes their direct application to ZSAD challenging. To address this scenario, we propose GlocalCLIP, which uniquely separates global and local prompts and jointly optimizes them. This approach enables the object-agnostic glocal semantic prompt to effectively capture general normal and anomalous patterns without dependency on specific objects in the image. We refine the text prompts for more precise adjustments by utilizing deep-text prompt tuning in the text encoder. In the vision encoder, we apply V-V attention layers to capture detailed local image features. Finally, we introduce glocal contrastive learning to improve the complementary learning of global and local prompts, effectively detecting anomalous patterns across various domains. The generalization performance of GlocalCLIP in ZSAD was demonstrated on 15 real-world datasets from both the industrial and medical domains, achieving superior performance compared to existing methods. Code will be made available at https://github.com/YUL-git/GlocalCLIP.
format Article
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_3127417410</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3127417410</sourcerecordid><originalsourceid>FETCH-proquest_journals_31274174103</originalsourceid><addsrcrecordid>eNqNjE8LgjAchkcQJOV3GHQezE0zuoX9BSEPHaKLTJmmzP1sW4e-fQv6AMELz-F5eCcoYJxHZB0zNkOhtT2llK1SliQ8QLejglqoLD8XG3ypelk7IloN1nU19q4SiuTfAhcGhtHhXAqjO93iBgy-SwPEPsDhrYZBqDfeSecvOtALNG2EsjL8cY6Wh_01O5HRwPMlrSt7eBntVckjlsaRH-X_VR8Ru0EG</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3127417410</pqid></control><display><type>article</type><title>GlocalCLIP: Object-agnostic Global-Local Prompt Learning for Zero-shot Anomaly Detection</title><source>Free E- Journals</source><creator>Ham, Jiyul ; Jung, Yonggon ; Jun-Geol Baek</creator><creatorcontrib>Ham, Jiyul ; Jung, Yonggon ; Jun-Geol Baek</creatorcontrib><description>Zero-shot anomaly detection (ZSAD) is crucial for detecting anomalous patterns in target datasets without using training samples, specifically in scenarios where there are distributional differences between the target domain and training data or where data scarcity arises because of restricted access. Although recently pretrained vision-language models demonstrate strong zero-shot performance across various visual tasks, they focus on learning class semantics, which makes their direct application to ZSAD challenging. To address this scenario, we propose GlocalCLIP, which uniquely separates global and local prompts and jointly optimizes them. This approach enables the object-agnostic glocal semantic prompt to effectively capture general normal and anomalous patterns without dependency on specific objects in the image. We refine the text prompts for more precise adjustments by utilizing deep-text prompt tuning in the text encoder. In the vision encoder, we apply V-V attention layers to capture detailed local image features. Finally, we introduce glocal contrastive learning to improve the complementary learning of global and local prompts, effectively detecting anomalous patterns across various domains. The generalization performance of GlocalCLIP in ZSAD was demonstrated on 15 real-world datasets from both the industrial and medical domains, achieving superior performance compared to existing methods. Code will be made available at https://github.com/YUL-git/GlocalCLIP.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Anomalies ; Coders ; Datasets ; Learning ; Semantics ; Target detection ; Vision ; Visual tasks</subject><ispartof>arXiv.org, 2024-12</ispartof><rights>2024. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>777,781</link.rule.ids></links><search><creatorcontrib>Ham, Jiyul</creatorcontrib><creatorcontrib>Jung, Yonggon</creatorcontrib><creatorcontrib>Jun-Geol Baek</creatorcontrib><title>GlocalCLIP: Object-agnostic Global-Local Prompt Learning for Zero-shot Anomaly Detection</title><title>arXiv.org</title><description>Zero-shot anomaly detection (ZSAD) is crucial for detecting anomalous patterns in target datasets without using training samples, specifically in scenarios where there are distributional differences between the target domain and training data or where data scarcity arises because of restricted access. Although recently pretrained vision-language models demonstrate strong zero-shot performance across various visual tasks, they focus on learning class semantics, which makes their direct application to ZSAD challenging. To address this scenario, we propose GlocalCLIP, which uniquely separates global and local prompts and jointly optimizes them. This approach enables the object-agnostic glocal semantic prompt to effectively capture general normal and anomalous patterns without dependency on specific objects in the image. We refine the text prompts for more precise adjustments by utilizing deep-text prompt tuning in the text encoder. In the vision encoder, we apply V-V attention layers to capture detailed local image features. Finally, we introduce glocal contrastive learning to improve the complementary learning of global and local prompts, effectively detecting anomalous patterns across various domains. The generalization performance of GlocalCLIP in ZSAD was demonstrated on 15 real-world datasets from both the industrial and medical domains, achieving superior performance compared to existing methods. Code will be made available at https://github.com/YUL-git/GlocalCLIP.</description><subject>Anomalies</subject><subject>Coders</subject><subject>Datasets</subject><subject>Learning</subject><subject>Semantics</subject><subject>Target detection</subject><subject>Vision</subject><subject>Visual tasks</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNqNjE8LgjAchkcQJOV3GHQezE0zuoX9BSEPHaKLTJmmzP1sW4e-fQv6AMELz-F5eCcoYJxHZB0zNkOhtT2llK1SliQ8QLejglqoLD8XG3ypelk7IloN1nU19q4SiuTfAhcGhtHhXAqjO93iBgy-SwPEPsDhrYZBqDfeSecvOtALNG2EsjL8cY6Wh_01O5HRwPMlrSt7eBntVckjlsaRH-X_VR8Ru0EG</recordid><startdate>20241208</startdate><enddate>20241208</enddate><creator>Ham, Jiyul</creator><creator>Jung, Yonggon</creator><creator>Jun-Geol Baek</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20241208</creationdate><title>GlocalCLIP: Object-agnostic Global-Local Prompt Learning for Zero-shot Anomaly Detection</title><author>Ham, Jiyul ; Jung, Yonggon ; Jun-Geol Baek</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_31274174103</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Anomalies</topic><topic>Coders</topic><topic>Datasets</topic><topic>Learning</topic><topic>Semantics</topic><topic>Target detection</topic><topic>Vision</topic><topic>Visual tasks</topic><toplevel>online_resources</toplevel><creatorcontrib>Ham, Jiyul</creatorcontrib><creatorcontrib>Jung, Yonggon</creatorcontrib><creatorcontrib>Jun-Geol Baek</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Ham, Jiyul</au><au>Jung, Yonggon</au><au>Jun-Geol Baek</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>GlocalCLIP: Object-agnostic Global-Local Prompt Learning for Zero-shot Anomaly Detection</atitle><jtitle>arXiv.org</jtitle><date>2024-12-08</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>Zero-shot anomaly detection (ZSAD) is crucial for detecting anomalous patterns in target datasets without using training samples, specifically in scenarios where there are distributional differences between the target domain and training data or where data scarcity arises because of restricted access. Although recently pretrained vision-language models demonstrate strong zero-shot performance across various visual tasks, they focus on learning class semantics, which makes their direct application to ZSAD challenging. To address this scenario, we propose GlocalCLIP, which uniquely separates global and local prompts and jointly optimizes them. This approach enables the object-agnostic glocal semantic prompt to effectively capture general normal and anomalous patterns without dependency on specific objects in the image. We refine the text prompts for more precise adjustments by utilizing deep-text prompt tuning in the text encoder. In the vision encoder, we apply V-V attention layers to capture detailed local image features. Finally, we introduce glocal contrastive learning to improve the complementary learning of global and local prompts, effectively detecting anomalous patterns across various domains. The generalization performance of GlocalCLIP in ZSAD was demonstrated on 15 real-world datasets from both the industrial and medical domains, achieving superior performance compared to existing methods. Code will be made available at https://github.com/YUL-git/GlocalCLIP.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2024-12
issn 2331-8422
language eng
recordid cdi_proquest_journals_3127417410
source Free E- Journals
subjects Anomalies
Coders
Datasets
Learning
Semantics
Target detection
Vision
Visual tasks
title GlocalCLIP: Object-agnostic Global-Local Prompt Learning for Zero-shot Anomaly Detection
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-21T09%3A40%3A52IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=GlocalCLIP:%20Object-agnostic%20Global-Local%20Prompt%20Learning%20for%20Zero-shot%20Anomaly%20Detection&rft.jtitle=arXiv.org&rft.au=Ham,%20Jiyul&rft.date=2024-12-08&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E3127417410%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3127417410&rft_id=info:pmid/&rfr_iscdi=true