Hierarchical Multi-Attention Transfer for Knowledge Distillation

Knowledge distillation (KD) is a powerful and widely applicable technique for the compression of deep learning models. The main idea of knowledge distillation is to transfer knowledge from a large teacher model to a small student model, where the attention mechanism has been intensively explored in...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:ACM transactions on multimedia computing communications and applications 2023-09, Vol.20 (2), p.1-20, Article 51
Hauptverfasser: Gou, Jianping, Sun, Liyuan, Yu, Baosheng, Wan, Shaohua, Tao, Dacheng
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!