Model Behavior Preserving for Class-Incremental Learning

Deep models have shown to be vulnerable to catastrophic forgetting, a phenomenon that the recognition performance on old data degrades when a pre-trained model is fine-tuned on new data. Knowledge distillation (KD) is a popular incremental approach to alleviate catastrophic forgetting. However, it u...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transaction on neural networks and learning systems 2023-10, Vol.34 (10), p.7529-7540
Hauptverfasser:	Liu, Yu, Hong, Xiaopeng, Tao, Xiaoyu, Dong, Songlin, Shi, Jingang, Gong, Yihong
Format:	Artikel
Sprache:	eng
Schlagworte:	Adaptation models Algorithms Artificial neural networks Catastrophic forgetting Computational modeling continual learning continuous learning Data models Distillation incremental learning Labels Learning Neural networks Perturbation methods Ranking Task analysis Training
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Deep models have shown to be vulnerable to catastrophic forgetting, a phenomenon that the recognition performance on old data degrades when a pre-trained model is fine-tuned on new data. Knowledge distillation (KD) is a popular incremental approach to alleviate catastrophic forgetting. However, it usually fixes the absolute values of neural responses for isolated historical instances, without considering the intrinsic structure of the responses by a convolutional neural network (CNN) model. To overcome this limitation, we recognize the importance of the global property of the whole instance set and treat it as a behavior characteristic of a CNN model relevant to model incremental learning. On this basis: 1) we design an instance neighborhood-preserving (INP) loss to maintain the order of pair-wise instance similarities of the old model in the feature space; 2) we devise a label priority-preserving (LPP) loss to preserve the label ranking lists within instance-wise label probability vectors in the output space; and 3) we introduce an efficient derivable ranking algorithm for calculating the two loss functions. Extensive experiments conducted on CIFAR100 and ImageNet show that our approach achieves the state-of-the-art performance.
ISSN:	2162-237X 2162-2388
DOI:	10.1109/TNNLS.2022.3144183