Knowledge Distillation with Contrastive Inter-Class Relationship

Due to the high computational cost, the application of deep neural networks (DNNs) to the real-time tasks has been limited. A possible solution is to compress the size of the model so that the demand for computation resources can be decreased. A popular method is called knowledge distillation (KD)....

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of physics. Conference series 2021-02, Vol.1756 (1), p.12001
Hauptverfasser: Yang, Chaoyi, Zeng, Jijun, Zhang, Jinbo
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Due to the high computational cost, the application of deep neural networks (DNNs) to the real-time tasks has been limited. A possible solution is to compress the size of the model so that the demand for computation resources can be decreased. A popular method is called knowledge distillation (KD). The basic philosophy behind KD is to transfer the information extracted from the larger teacher network to the smaller student network. The general knowledge transfer strategy is to match the one-to-one logit or intermedia layers between the teacher and student networks, correspondingly. This objective may neglect the informative relationship information between different samples. In this paper, we borrow the idea of metric learning to transfer the contrastive relationship information learned from teacher network to the student. Specifically, we use the well-known Triplet loss to regularize the training of the student network. By the modified negative selection strategy, our Contrastive Knowledge Distillation (CKD) method can efficiently improve the performance of the student network compared with the traditional KD methods. Empirical experiments on KD benchmarks and real-world datasets also demonstrate the superiority of CKD.
ISSN:1742-6588
1742-6596
DOI:10.1088/1742-6596/1756/1/012001