RoSearch: Search for Robust Student Architectures When Distilling Pre-trained Language Models
Pre-trained language models achieve outstanding performance in NLP tasks. Various knowledge distillation methods have been proposed to reduce the heavy computation and storage requirements of pre-trained language models. However, from our observations, student models acquired by knowledge distillati...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Pre-trained language models achieve outstanding performance in NLP tasks.
Various knowledge distillation methods have been proposed to reduce the heavy
computation and storage requirements of pre-trained language models. However,
from our observations, student models acquired by knowledge distillation suffer
from adversarial attacks, which limits their usage in security sensitive
scenarios. In order to overcome these security problems, RoSearch is proposed
as a comprehensive framework to search the student models with better
adversarial robustness when performing knowledge distillation. A directed
acyclic graph based search space is built and an evolutionary search strategy
is utilized to guide the searching approach. Each searched architecture is
trained by knowledge distillation on pre-trained language model and then
evaluated under a robustness-, accuracy- and efficiency-aware metric as
environmental fitness. Experimental results show that RoSearch can improve
robustness of student models from 7%~18% up to 45.8%~47.8% on different
datasets with comparable weight compression ratio to existing distillation
methods (4.6$\times$~6.5$\times$ improvement from teacher model BERT_BASE) and
low accuracy drop. In addition, we summarize the relationship between student
architecture and robustness through statistics of searched models. |
---|---|
DOI: | 10.48550/arxiv.2106.03613 |