Efficient Architecture Search for Continual Learning

Continual learning with neural networks, which aims to learn a sequence of tasks, is an important learning framework in artificial intelligence (AI). However, it often confronts three challenges: 1) overcome the catastrophic forgetting problem; 2) adapt the current network to new tasks; and 3) contr...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transaction on neural networks and learning systems 2023-11, Vol.34 (11), p.8555-8565
Hauptverfasser: Gao, Qiang, Luo, Zhipeng, Klabjan, Diego, Zhang, Fengli
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Continual learning with neural networks, which aims to learn a sequence of tasks, is an important learning framework in artificial intelligence (AI). However, it often confronts three challenges: 1) overcome the catastrophic forgetting problem; 2) adapt the current network to new tasks; and 3) control its model complexity. To reach these goals, we propose a novel approach named continual learning with efficient architecture search (CLEAS). CLEAS works closely with neural architecture search (NAS), which leverages reinforcement learning techniques to search for the best neural architecture that fits a new task. In particular, we design a neuron-level NAS controller that decides which old neurons from previous tasks should be reused (knowledge transfer) and which new neurons should be added (to learn new knowledge). Such a fine-grained controller allows finding a very concise architecture that can fit each new task well. Meanwhile, since we do not alter the weights of the reused neurons, we perfectly memorize the knowledge learned from the previous tasks. We evaluate CLEAS on numerous sequential classification tasks, and the results demonstrate that CLEAS outperforms other state-of-the-art alternative methods, achieving higher classification accuracy while using simpler neural architectures.
ISSN:2162-237X
2162-2388
DOI:10.1109/TNNLS.2022.3151511