System and method for knowledge distillation between neural networks

Systems and methods for knowledge distillation provide supervised training of a student network with a teacher network, including inputting a batch to the teacher network, inputting the batch to the student network, generating a teacher activation map at a layer of the teacher network, generating a...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Mori, Gregory, Tung, Frederick
Format: Patent
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Systems and methods for knowledge distillation provide supervised training of a student network with a teacher network, including inputting a batch to the teacher network, inputting the batch to the student network, generating a teacher activation map at a layer of the teacher network, generating a student activation map at a layer of the student network corresponding to the layer of the teacher network, generating a pairwise teacher similarity matrix based on the teacher activation map, generating a pairwise student similarity matrix based on the student activation map, and minimizing a knowledge distillation loss defined as a difference between the pairwise teacher similarity matrix and the pairwise student similarity matrix.