Balanced self-distillation for long-tailed recognition

In long-tailed recognition tasks, the knowledge distillation technology is widely adopted for improving performance of deep neural networks. These methods distill the knowledge from the pretrained teacher model to the student model, which enables higher long-tailed recognition accuracy. However, the...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Knowledge-based systems 2024-04, Vol.290, p.111504, Article 111504
Hauptverfasser:	Ren, Ning, Li, Xiaosong, Wu, Yanxia, Fu, Yan
Format:	Artikel
Sprache:	eng
Schlagworte:	Convolutional neural network Imbalanced learning Knowledge distillation Long-tailed recognition
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In long-tailed recognition tasks, the knowledge distillation technology is widely adopted for improving performance of deep neural networks. These methods distill the knowledge from the pretrained teacher model to the student model, which enables higher long-tailed recognition accuracy. However, the dependence on accompanying assistive models complicates the single network’s training process in the need for large memory and time costs. In this work, we present Balanced Self-Distillation (BSD) to distill tail knowledge by a single network without the assistive models. Specifically, BSD distills knowledge between different distortions of the same samples to stimulate the representation learning potential of the single network and adopts a balanced class weight for shifting the distillation focus from head-to-tail classes. Comprehensive experimentation across diverse datasets, including CIFAR-10-LT, CIFAR-100-LT and TinyImageNet-LT, consistently outperforms robust baseline methods. Specifically, BSD achieves improvements of 8.13% on CIFAR-100-LT with an imbalance ratio of 100 compared to the baseline (cross entropy). Furthermore, the proposed method enables seamless integration with contemporary techniques like re-sampling, meta-learning, and cost-sensitive learning. It emerges as a versatile tool capable of effectively addressing the challenges of long-tailed scenarios.
ISSN:	0950-7051 1872-7409
DOI:	10.1016/j.knosys.2024.111504