DFEF: Diversify feature enhancement and fusion for online knowledge distillation

Traditional knowledge distillation relies on high‐capacity teacher models to supervise the training of compact student networks. To avoid the computational resource costs associated with pretraining high‐capacity teacher models, teacher‐free online knowledge distillation methods have achieved satisf...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Expert systems 2024-09, Vol.41 (9), p.n/a
Hauptverfasser: Liang, Xingzhu, Zhang, Jian, Liu, Erhu, Fang, Xianjin
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Traditional knowledge distillation relies on high‐capacity teacher models to supervise the training of compact student networks. To avoid the computational resource costs associated with pretraining high‐capacity teacher models, teacher‐free online knowledge distillation methods have achieved satisfactory performance. Among these methods, feature fusion methods have effectively alleviated the limitations of training without the strong guidance of a powerful teacher model. However, existing feature fusion methods often focus primarily on end‐layer features, overlooking the efficient utilization of holistic knowledge loops and high‐level information within the network. In this article, we propose a new feature fusion‐based mutual learning method called Diversify Feature Enhancement and Fusion for Online Knowledge Distillation (DFEF). First, we enhance advanced semantic information by mapping multiple end‐of‐network features to obtain richer feature representations. Next, we design a self‐distillation module to strengthen knowledge interactions between the deep and shallow network layers. Additionally, we employ attention mechanisms to provide deeper and more diversified enhancements to the input feature maps of the self‐distillation module, allowing the entire network architecture to acquire a broader range of knowledge. Finally, we employ feature fusion to merge the enhanced features and generate a high‐performance virtual teacher to guide the training of the student model. Extensive evaluations on the CIFAR‐10, CIFAR‐100, and CINIC‐10 datasets demonstrate that our proposed method can significantly enhance performance compared to state‐of‐the‐art feature fusion‐based online knowledge distillation methods. Our code can be found at https://github.com/JSJ515-Group/DFEF-Liu.
ISSN:0266-4720
1468-0394
DOI:10.1111/exsy.13593