Model distillation method and related equipment
The invention relates to the field of artificial intelligence, and discloses a model distillation method, which comprises the steps of distilling a student model at a first calculation node of a calculation node cluster through a partial model of the student model and a partial model of a teacher mo...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Patent |
Sprache: | chi ; eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The invention relates to the field of artificial intelligence, and discloses a model distillation method, which comprises the steps of distilling a student model at a first calculation node of a calculation node cluster through a partial model of the student model and a partial model of a teacher model, and performing a gradient return process of distillation inside the first calculation node, and the distillation of the responsible network layer is completed without depending on other calculation nodes, so that the higher calculation resource utilization rate is realized, and the acceleration of the distillation process is further realized.
本申请涉及人工智能领域,公开了一种模型蒸馏方法,包括:在计算节点集群的第一计算节点处,通过学生模型的部分模型以及老师模型的部分模型,对学生模型进行蒸馏,且在蒸馏的梯度回传过程在第一计算节点的内部进行,不依赖于其他计算节点完成所负责的网络层的蒸馏,以此实现更大的计算资源利用率,进而实现蒸馏过程的加速。 |
---|