Layer-fusion for online mutual knowledge distillation

Online knowledge distillation opens a door for distillation on parallel student networks, which breaks the heavy reliance upon the pre-trained teacher model. The additional feature fusion solutions further provide positive training loop among parallel student networks. However, current feature fusio...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Multimedia systems 2023-04, Vol.29 (2), p.787-796
Hauptverfasser:	Hu, Gan, Ji, Yanli, Liang, Xingzhu, Han, Yuexing
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Communication Networks Computer Graphics Computer Science Cryptology Data Storage Representation Distillation Learning Modules Multimedia Information Systems Networks Operating Systems Regular Paper Teachers Training
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Online knowledge distillation opens a door for distillation on parallel student networks, which breaks the heavy reliance upon the pre-trained teacher model. The additional feature fusion solutions further provide positive training loop among parallel student networks. However, current feature fusion operation is always set at the end of sub-networks, thus its capability is limited. In this paper, we propose a novel online knowledge distillation approach by designing multiple layer-level feature fusion modules to connect sub-networks, which contributes to triggering mutual learning among student networks. For model training, fusion modules of middle layers are regarded as auxiliary teachers, while the fusion module at the end of sub-networks is used as the ensemble teacher. Each sub-network is optimized under the supervision of two kinds of knowledge transmitted by different teachers. Furthermore, the attention learning is adopted to enhance feature representation in fusion modules applied to middle layers, which assists to obtain representative features. Extensive evaluations are performed on CIFAR10/CIFAR100, and ImageNet2012 datasets, and experiment results exhibit outstanding performance of our proposed approach.
ISSN:	0942-4962 1432-1882
DOI:	10.1007/s00530-022-01021-6