Advancing Pre-trained Teacher: Towards Robust Feature Discrepancy for Anomaly Detection
With the wide application of knowledge distillation between an ImageNet pre-trained teacher model and a learnable student model, industrial anomaly detection has witnessed a significant achievement in the past few years. The success of knowledge distillation mainly relies on how to keep the feature...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | With the wide application of knowledge distillation between an ImageNet
pre-trained teacher model and a learnable student model, industrial anomaly
detection has witnessed a significant achievement in the past few years. The
success of knowledge distillation mainly relies on how to keep the feature
discrepancy between the teacher and student model, in which it assumes that:
(1) the teacher model can jointly represent two different distributions for the
normal and abnormal patterns, while (2) the student model can only reconstruct
the normal distribution. However, it still remains a challenging issue to
maintain these ideal assumptions in practice. In this paper, we propose a
simple yet effective two-stage industrial anomaly detection framework, termed
as AAND, which sequentially performs Anomaly Amplification and Normality
Distillation to obtain robust feature discrepancy. In the first anomaly
amplification stage, we propose a novel Residual Anomaly Amplification (RAA)
module to advance the pre-trained teacher encoder. With the exposure of
synthetic anomalies, it amplifies anomalies via residual generation while
maintaining the integrity of pre-trained model. It mainly comprises a
Matching-guided Residual Gate and an Attribute-scaling Residual Generator,
which can determine the residuals' proportion and characteristic, respectively.
In the second normality distillation stage, we further employ a reverse
distillation paradigm to train a student decoder, in which a novel Hard
Knowledge Distillation (HKD) loss is built to better facilitate the
reconstruction of normal patterns. Comprehensive experiments on the MvTecAD,
VisA, and MvTec3D-RGB datasets show that our method achieves state-of-the-art
performance. |
---|---|
DOI: | 10.48550/arxiv.2405.02068 |