System and method for knowledge distillation between neural networks

Systems and methods for knowledge distillation provide supervised training of a student network with a teacher network, including inputting a batch to the teacher network, inputting the batch to the student network, generating a teacher activation map at a layer of the teacher network, generating a...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Mori, Gregory, Tung, Frederick
Format:	Patent
Sprache:	eng
Schlagworte:	CALCULATING COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS COMPUTING COUNTING ELECTRIC DIGITAL DATA PROCESSING PHYSICS
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Systems and methods for knowledge distillation provide supervised training of a student network with a teacher network, including inputting a batch to the teacher network, inputting the batch to the student network, generating a teacher activation map at a layer of the teacher network, generating a student activation map at a layer of the student network corresponding to the layer of the teacher network, generating a pairwise teacher similarity matrix based on the teacher activation map, generating a pairwise student similarity matrix based on the student activation map, and minimizing a knowledge distillation loss defined as a difference between the pairwise teacher similarity matrix and the pairwise student similarity matrix.