Explaining Knowledge Distillation by Quantifying the Knowledge
This paper presents a method to interpret the success of knowledge distillation by quantifying and analyzing task-relevant and task-irrelevant visual concepts that are encoded in intermediate layers of a deep neural network (DNN). More specifically, three hypotheses are proposed as follows. 1. Knowl...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | This paper presents a method to interpret the success of knowledge
distillation by quantifying and analyzing task-relevant and task-irrelevant
visual concepts that are encoded in intermediate layers of a deep neural
network (DNN). More specifically, three hypotheses are proposed as follows. 1.
Knowledge distillation makes the DNN learn more visual concepts than learning
from raw data. 2. Knowledge distillation ensures that the DNN is prone to
learning various visual concepts simultaneously. Whereas, in the scenario of
learning from raw data, the DNN learns visual concepts sequentially. 3.
Knowledge distillation yields more stable optimization directions than learning
from raw data. Accordingly, we design three types of mathematical metrics to
evaluate feature representations of the DNN. In experiments, we diagnosed
various DNNs, and above hypotheses were verified. |
---|---|
DOI: | 10.48550/arxiv.2003.03622 |