Produce Once, Utilize Twice for Anomaly Detection

Visual anomaly detection aims at classifying and locating the regions that deviate from the normal appearance. Embedding-based methods and reconstruction-based methods are two main approaches for this task. The embedding-based methods typically predict the anomaly by measuring the distances between...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on circuits and systems for video technology 2024-11, Vol.34 (11), p.11751-11767
Hauptverfasser: Wang, Shuyuan, Li, Qi, Luo, Huiyuan, Lv, Chengkan, Zhang, Zhengtao
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Visual anomaly detection aims at classifying and locating the regions that deviate from the normal appearance. Embedding-based methods and reconstruction-based methods are two main approaches for this task. The embedding-based methods typically predict the anomaly by measuring the distances between the deep representations of the test samples and a limited number of nominal samples, which enables these methods to be efficient but struggle in providing a fine-grained pixel-level anomaly location. The reconstruction-based methods rely on the pixel-level reconstruction errors to locate the anomaly, thereby the anomaly predictions are fine-grained. However, there are repetitive feature extractions and usually extra modules to guarantee the quality of the reconstructed images, resulting in unsatisfactory detection efficiency. In a nutshell, the prior methods are either not efficient or not precise enough for the industrial detection. To deal with this problem, we derive POUTA (Produce Once Utilize Twice for Anomaly detection), which improves both the accuracy and efficiency by reusing the discriminant information potential in the reconstructive network. We observe that the encoder and decoder representations of the reconstructive network are able to stand for the features of the original and reconstructed image respectively. And the discrepancies between the symmetric reconstructive representations provides roughly accurate anomaly information. To refine this information, a coarse-to-fine process is proposed in POUTA, which calibrates the semantics of each discriminative layer by the high-level representations and supervision loss. Equipped with the above modules, POUTA is endowed with the ability to provide a more precise anomaly location than the prior arts. Besides, the representation reusage also enables to exclude the feature extraction process in the discriminative network, which reduces the parameters and improves the efficiency. Extensive experiments show that, POUTA is superior or comparable to the prior methods with even less cost. Furthermore, POUTA also achieves better performance than the state-of-the-art few-shot anomaly detection methods without any special design, showing that POUTA has strong ability to learn representations inherent in the training data.
ISSN:1051-8215
1558-2205
DOI:10.1109/TCSVT.2024.3420775