SYSTEMS AND METHODS RELATING TO KNOWLEDGE DISTILLATION IN NATURAL LANGUAGE PROCESSING MODELS

A method for creating a student model from a teacher model for knowledge distillation. The method including: providing a first model; using a first instance of the first model to create the teacher model by training the first instance of the first model on a training dataset; using a second instance...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: BUDUGUPPA, PAVAN, ELLURU, VEERA RAGHAVENDRA, SUNDARAM, RAMASUBRAMANIAN
Format: Patent
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:A method for creating a student model from a teacher model for knowledge distillation. The method including: providing a first model; using a first instance of the first model to create the teacher model by training the first instance of the first model on a training dataset; using a second instance of the first model to create the student model by training the second instance of the first model on a subset of the training dataset; identifying corresponding layers in the teacher model and the student model; for each of the corresponding layers, computing a weight similarity criterion; ranking the corresponding layers according to the weight similarity criterion; selecting, based on the ranking, one or more of the corresponding layers for designation as one or more discard layers; removing from the student model the one or more discard layers.