Attention-based Knowledge Distillation in Multi-attention Tasks: The Impact of a DCT-driven Loss
Knowledge Distillation (KD) is a strategy for the definition of a set of transferability gangways to improve the efficiency of Convolutional Neural Networks. Feature-based Knowledge Distillation is a subfield of KD that relies on intermediate network representations, either unaltered or depth-reduce...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Knowledge Distillation (KD) is a strategy for the definition of a set of
transferability gangways to improve the efficiency of Convolutional Neural
Networks. Feature-based Knowledge Distillation is a subfield of KD that relies
on intermediate network representations, either unaltered or depth-reduced via
maximum activation maps, as the source knowledge. In this paper, we propose and
analyse the use of a 2D frequency transform of the activation maps before
transferring them. We pose that\textemdash by using global image cues rather
than pixel estimates, this strategy enhances knowledge transferability in tasks
such as scene recognition, defined by strong spatial and contextual
relationships between multiple and varied concepts. To validate the proposed
method, an extensive evaluation of the state-of-the-art in scene recognition is
presented. Experimental results provide strong evidences that the proposed
strategy enables the student network to better focus on the relevant image
areas learnt by the teacher network, hence leading to better descriptive
features and higher transferred performance than every other state-of-the-art
alternative. We publicly release the training and evaluation framework used
along this paper at
http://www-vpu.eps.uam.es/publications/DCTBasedKDForSceneRecognition. |
---|---|
DOI: | 10.48550/arxiv.2205.01997 |