Knowledge distillation for incremental learning in semantic segmentation
Deep learning architectures have shown remarkable results in scene understanding problems, however they exhibit a critical drop of performances when they are required to learn incrementally new tasks without forgetting old ones. This catastrophic forgetting phenomenon impacts on the deployment of ar...
Gespeichert in:
Veröffentlicht in: | Computer vision and image understanding 2021-04, Vol.205, p.103167, Article 103167 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Deep learning architectures have shown remarkable results in scene understanding problems, however they exhibit a critical drop of performances when they are required to learn incrementally new tasks without forgetting old ones. This catastrophic forgetting phenomenon impacts on the deployment of artificial intelligence in real world scenarios where systems need to learn new and different representations over time. Current approaches for incremental learning deal only with image classification and object detection tasks, while in this work we formally introduce incremental learning for semantic segmentation. We tackle the problem applying various knowledge distillation techniques on the previous model. In this way, we retain the information about learned classes, whilst updating the current model to learn the new ones. We developed four main methodologies of knowledge distillation working on both output layers and internal feature representations. We do not store any image belonging to previous training stages and only the last model is used to preserve high accuracy on previously learned classes. Extensive experimental results on the Pascal VOC2012 and MSRC-v2 datasets show the effectiveness of the proposed approaches in several incremental learning scenarios.
•This work is the first study on incremental learning for semantic segmentation.•We introduce four novel distillation schemes to preserve previous knowledge.•A distillation loss enforces similarity of multiple decoding stages simultaneously.•We conducted extensive experiments on Pascal VOC2012 and MSRC-v2 datasets. |
---|---|
ISSN: | 1077-3142 1090-235X |
DOI: | 10.1016/j.cviu.2021.103167 |