Layer-fusion for online mutual knowledge distillation
Online knowledge distillation opens a door for distillation on parallel student networks, which breaks the heavy reliance upon the pre-trained teacher model. The additional feature fusion solutions further provide positive training loop among parallel student networks. However, current feature fusio...
Gespeichert in:
Veröffentlicht in: | Multimedia systems 2023-04, Vol.29 (2), p.787-796 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Online knowledge distillation opens a door for distillation on parallel student networks, which breaks the heavy reliance upon the pre-trained teacher model. The additional feature fusion solutions further provide positive training loop among parallel student networks. However, current feature fusion operation is always set at the end of sub-networks, thus its capability is limited. In this paper, we propose a novel online knowledge distillation approach by designing multiple layer-level feature fusion modules to connect sub-networks, which contributes to triggering mutual learning among student networks. For model training, fusion modules of middle layers are regarded as auxiliary teachers, while the fusion module at the end of sub-networks is used as the ensemble teacher. Each sub-network is optimized under the supervision of two kinds of knowledge transmitted by different teachers. Furthermore, the attention learning is adopted to enhance feature representation in fusion modules applied to middle layers, which assists to obtain representative features. Extensive evaluations are performed on CIFAR10/CIFAR100, and ImageNet2012 datasets, and experiment results exhibit outstanding performance of our proposed approach. |
---|---|
ISSN: | 0942-4962 1432-1882 |
DOI: | 10.1007/s00530-022-01021-6 |