MTED: multiple teachers ensemble distillation for compact semantic segmentation

Current state-of-the-art semantic segmentation models achieve great success. However, their vast model size and computational cost limit their applications in many real-time systems and mobile devices. Knowledge distillation is one promising solution to compress the segmentation models. However, the...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Neural computing & applications 2023-06, Vol.35 (16), p.11789-11806
Hauptverfasser:	Wang, Chen, Zhong, Jiang, Dai, Qizhu, Yu, Qien, Qi, Yafei, Fang, Bin, Li, Xue
Format:	Artikel
Sprache:	eng
Schlagworte:	Artificial Intelligence Computational Biology/Bioinformatics Computational Science and Engineering Computer Science Data Mining and Knowledge Discovery Datasets Distillation Electronic devices Feature maps Image Processing and Computer Vision Knowledge management Original Article Probability and Statistics in Computer Science Semantic segmentation Semantics Teachers Teaching methods
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Current state-of-the-art semantic segmentation models achieve great success. However, their vast model size and computational cost limit their applications in many real-time systems and mobile devices. Knowledge distillation is one promising solution to compress the segmentation models. However, the knowledge from a single teacher may be insufficient, and the student may also inherit bias from the teacher. This paper proposes a multi-teacher ensemble distillation framework named MTED for semantic segmentation. The key idea is to effectively transfer the comprehensive knowledge from multiple teachers to one student. We present one multi-teacher output-based distillation loss to effectively distill the valuable knowledge in output probabilities to the student. We construct one adaptive weight assignment module to dynamically assign different weights to different teachers at each pixel. In addition, we introduce one multi-teacher feature-based distillation loss to transfer the comprehensive knowledge in the feature maps efficiently. We conduct extensive experiments on three benchmark datasets, Cityscapes, CamVid, and Pascal VOC 2012. The results show that the proposed MTED performs much better than the single-teacher methods on three datasets, e.g., Cityscapes, CamVid, and Pascal VOC 2012.
ISSN:	0941-0643 1433-3058
DOI:	10.1007/s00521-023-08321-6