GEIKD: Self-knowledge distillation based on gated ensemble networks and influences-based label noise removal

Self-distillation has gained widespread attention in recent years because it progressively transfers the knowledge in end-to-end training schemes within one network. However, self-distillation methods are susceptible to label noise hence leading to poor generalization performance. To address this pr...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Computer vision and image understanding 2023-10, Vol.235, p.103771, Article 103771
Hauptverfasser: Liu, Fuchang, Wang, Yu, Li, Zheng, Pan, Zhigeng
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Self-distillation has gained widespread attention in recent years because it progressively transfers the knowledge in end-to-end training schemes within one network. However, self-distillation methods are susceptible to label noise hence leading to poor generalization performance. To address this problem, this paper proposes a novel self-distillation method, called GEIKD, which combines a gated ensemble self-teacher network and the influences-based label noise removal. Specifically, we design a gated ensemble self-teacher network composed of multiple teacher branches, which allows a gated fused knowledge based on a weighted bi-directional feature pyramid network. Moreover, we introduce influences estimation into the distillation process to quantify the effect of noisy labels on the distillation loss, and then reject the unfavorable instances as noisy labeled samples according to the calculated influences. Our influences-based label noise removal can be integrated with any existing knowledge distillation training schemes. The impact of noisy labels on knowledge distillation can be significantly alleviated by the proposed noisy instances removal with little extra training efforts. Experiments show that the proposed GEIKD method outperforms the state-of-the-art methods on CIFAR-100, TinyimageNet and fine-grained datasets CUB200, MIT-67, Stanford40 and FERC dataset, using clean data and data with noisy labels. •Propose a gated ensemble self-teacher network consists of multiple self-teacher branches against noisy labels without any data augmentation.•Propose an influences-based label noise removal via plugging the influences estimation into the self-KD training process.•Develop a plug-and-play defense tool to any existing knowledge distillation network against poisoning attacks with mislabeling.
ISSN:1077-3142
1090-235X
DOI:10.1016/j.cviu.2023.103771