FiLo++: Zero-/Few-Shot Anomaly Detection by Fused Fine-Grained Descriptions and Deformable Localization
Anomaly detection methods typically require extensive normal samples from the target class for training, limiting their applicability in scenarios that require rapid adaptation, such as cold start. Zero-shot and few-shot anomaly detection do not require labeled samples from the target class in advan...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Anomaly detection methods typically require extensive normal samples from the
target class for training, limiting their applicability in scenarios that
require rapid adaptation, such as cold start. Zero-shot and few-shot anomaly
detection do not require labeled samples from the target class in advance,
making them a promising research direction. Existing zero-shot and few-shot
approaches often leverage powerful multimodal models to detect and localize
anomalies by comparing image-text similarity. However, their handcrafted
generic descriptions fail to capture the diverse range of anomalies that may
emerge in different objects, and simple patch-level image-text matching often
struggles to localize anomalous regions of varying shapes and sizes. To address
these issues, this paper proposes the FiLo++ method, which consists of two key
components. The first component, Fused Fine-Grained Descriptions (FusDes),
utilizes large language models to generate anomaly descriptions for each object
category, combines both fixed and learnable prompt templates and applies a
runtime prompt filtering method, producing more accurate and task-specific
textual descriptions. The second component, Deformable Localization (DefLoc),
integrates the vision foundation model Grounding DINO with position-enhanced
text descriptions and a Multi-scale Deformable Cross-modal Interaction (MDCI)
module, enabling accurate localization of anomalies with various shapes and
sizes. In addition, we design a position-enhanced patch matching approach to
improve few-shot anomaly detection performance. Experiments on multiple
datasets demonstrate that FiLo++ achieves significant performance improvements
compared with existing methods. Code will be available at
https://github.com/CASIA-IVA-Lab/FiLo. |
---|---|
DOI: | 10.48550/arxiv.2501.10067 |