Multi scale Feature Extraction and Fusion for Online Knowledge Distillation
Online knowledge distillation conducts knowledge transfer among all student models to alleviate the reliance on pre-trained models. However, existing online methods rely heavily on the prediction distributions and neglect the further exploration of the representational knowledge. In this paper, we p...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Online knowledge distillation conducts knowledge transfer among all student
models to alleviate the reliance on pre-trained models. However, existing
online methods rely heavily on the prediction distributions and neglect the
further exploration of the representational knowledge. In this paper, we
propose a novel Multi-scale Feature Extraction and Fusion method (MFEF) for
online knowledge distillation, which comprises three key components:
Multi-scale Feature Extraction, Dual-attention and Feature Fusion, towards
generating more informative feature maps for distillation. The multiscale
feature extraction exploiting divide-and-concatenate in channel dimension is
proposed to improve the multi-scale representation ability of feature maps. To
obtain more accurate information, we design a dual-attention to strengthen the
important channel and spatial regions adaptively. Moreover, we aggregate and
fuse the former processed feature maps via feature fusion to assist the
training of student models. Extensive experiments on CIF AR-10, CIF AR-100, and
CINIC-10 show that MFEF transfers more beneficial representational knowledge
for distillation and outperforms alternative methods among various network
architectures |
---|---|
DOI: | 10.48550/arxiv.2206.08224 |