From Mimicking to Integrating: Knowledge Integration for Pre-Trained Language Models
Investigating better ways to reuse the released pre-trained language models (PLMs) can significantly reduce the computational cost and the potential environmental side-effects. This paper explores a novel PLM reuse paradigm, Knowledge Integration (KI). Without human annotations available, KI aims to...
Gespeichert in:
Hauptverfasser: | , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Investigating better ways to reuse the released pre-trained language models
(PLMs) can significantly reduce the computational cost and the potential
environmental side-effects. This paper explores a novel PLM reuse paradigm,
Knowledge Integration (KI). Without human annotations available, KI aims to
merge the knowledge from different teacher-PLMs, each of which specializes in a
different classification problem, into a versatile student model. To achieve
this, we first derive the correlation between virtual golden supervision and
teacher predictions. We then design a Model Uncertainty--aware Knowledge
Integration (MUKI) framework to recover the golden supervision for the student.
Specifically, MUKI adopts Monte-Carlo Dropout to estimate model uncertainty for
the supervision integration. An instance-wise re-weighting mechanism based on
the margin of uncertainty scores is further incorporated, to deal with the
potential conflicting supervision from teachers. Experimental results
demonstrate that MUKI achieves substantial improvements over baselines on
benchmark datasets. Further analysis shows that MUKI can generalize well for
merging teacher models with heterogeneous architectures, and even teachers
major in cross-lingual datasets. |
---|---|
DOI: | 10.48550/arxiv.2210.05230 |