SYSTEMS AND METHODS RELATING TO KNOWLEDGE DISTILLATION IN NATURAL LANGUAGE PROCESSING MODELS
A method for creating a student model from a teacher model for knowledge distillation. The method including: providing a first model; using a first instance of the first model to create the teacher model by training the first instance of the first model on a training dataset; using a second instance...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Patent |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | A method for creating a student model from a teacher model for knowledge distillation. The method including: providing a first model; using a first instance of the first model to create the teacher model by training the first instance of the first model on a training dataset; using a second instance of the first model to create the student model by training the second instance of the first model on a subset of the training dataset; identifying corresponding layers in the teacher model and the student model; for each of the corresponding layers, computing a weight similarity criterion; ranking the corresponding layers according to the weight similarity criterion; selecting, based on the ranking, one or more of the corresponding layers for designation as one or more discard layers; removing from the student model the one or more discard layers. |
---|