RoSearch: Search for Robust Student Architectures When Distilling Pre-trained Language Models
Pre-trained language models achieve outstanding performance in NLP tasks. Various knowledge distillation methods have been proposed to reduce the heavy computation and storage requirements of pre-trained language models. However, from our observations, student models acquired by knowledge distillati...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Guo, Xin Yang, Jianlei Zhou, Haoyi Ye, Xucheng Li, Jianxin |
description | Pre-trained language models achieve outstanding performance in NLP tasks.
Various knowledge distillation methods have been proposed to reduce the heavy
computation and storage requirements of pre-trained language models. However,
from our observations, student models acquired by knowledge distillation suffer
from adversarial attacks, which limits their usage in security sensitive
scenarios. In order to overcome these security problems, RoSearch is proposed
as a comprehensive framework to search the student models with better
adversarial robustness when performing knowledge distillation. A directed
acyclic graph based search space is built and an evolutionary search strategy
is utilized to guide the searching approach. Each searched architecture is
trained by knowledge distillation on pre-trained language model and then
evaluated under a robustness-, accuracy- and efficiency-aware metric as
environmental fitness. Experimental results show that RoSearch can improve
robustness of student models from 7%~18% up to 45.8%~47.8% on different
datasets with comparable weight compression ratio to existing distillation
methods (4.6$\times$~6.5$\times$ improvement from teacher model BERT_BASE) and
low accuracy drop. In addition, we summarize the relationship between student
architecture and robustness through statistics of searched models. |
doi_str_mv | 10.48550/arxiv.2106.03613 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2106_03613</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2106_03613</sourcerecordid><originalsourceid>FETCH-LOGICAL-a673-8ec1c9e23ea15723fb65c30567a4d1d8c5e779fbc396ac9443e0ffee52f835a03</originalsourceid><addsrcrecordid>eNotz71OwzAUhmEvDKhwAUz1DSTYObGTsFXlVwoCtZU6oejEOU4thQTZDoK7B1qmV_qGT3oYu5IizUulxDX6L_eZZlLoVICWcM7eNtOW0JvDDT-V28nzzdTOIfJtnDsaI1_97i6SibOnwPcHGvmtC9ENgxt7_uopiR7dSB2vcexn7Ik_Tx0N4YKdWRwCXf53wXb3d7v1Y1K_PDytV3WCuoCkJCNNRRkQSlVkYFutDAilC8w72ZVGUVFUtjVQaTRVngMJa4lUZktQKGDBlqfbo6_58O4d_Xfz52yOTvgBwDROkQ</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>RoSearch: Search for Robust Student Architectures When Distilling Pre-trained Language Models</title><source>arXiv.org</source><creator>Guo, Xin ; Yang, Jianlei ; Zhou, Haoyi ; Ye, Xucheng ; Li, Jianxin</creator><creatorcontrib>Guo, Xin ; Yang, Jianlei ; Zhou, Haoyi ; Ye, Xucheng ; Li, Jianxin</creatorcontrib><description>Pre-trained language models achieve outstanding performance in NLP tasks.
Various knowledge distillation methods have been proposed to reduce the heavy
computation and storage requirements of pre-trained language models. However,
from our observations, student models acquired by knowledge distillation suffer
from adversarial attacks, which limits their usage in security sensitive
scenarios. In order to overcome these security problems, RoSearch is proposed
as a comprehensive framework to search the student models with better
adversarial robustness when performing knowledge distillation. A directed
acyclic graph based search space is built and an evolutionary search strategy
is utilized to guide the searching approach. Each searched architecture is
trained by knowledge distillation on pre-trained language model and then
evaluated under a robustness-, accuracy- and efficiency-aware metric as
environmental fitness. Experimental results show that RoSearch can improve
robustness of student models from 7%~18% up to 45.8%~47.8% on different
datasets with comparable weight compression ratio to existing distillation
methods (4.6$\times$~6.5$\times$ improvement from teacher model BERT_BASE) and
low accuracy drop. In addition, we summarize the relationship between student
architecture and robustness through statistics of searched models.</description><identifier>DOI: 10.48550/arxiv.2106.03613</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Computation and Language ; Computer Science - Learning</subject><creationdate>2021-06</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2106.03613$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2106.03613$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Guo, Xin</creatorcontrib><creatorcontrib>Yang, Jianlei</creatorcontrib><creatorcontrib>Zhou, Haoyi</creatorcontrib><creatorcontrib>Ye, Xucheng</creatorcontrib><creatorcontrib>Li, Jianxin</creatorcontrib><title>RoSearch: Search for Robust Student Architectures When Distilling Pre-trained Language Models</title><description>Pre-trained language models achieve outstanding performance in NLP tasks.
Various knowledge distillation methods have been proposed to reduce the heavy
computation and storage requirements of pre-trained language models. However,
from our observations, student models acquired by knowledge distillation suffer
from adversarial attacks, which limits their usage in security sensitive
scenarios. In order to overcome these security problems, RoSearch is proposed
as a comprehensive framework to search the student models with better
adversarial robustness when performing knowledge distillation. A directed
acyclic graph based search space is built and an evolutionary search strategy
is utilized to guide the searching approach. Each searched architecture is
trained by knowledge distillation on pre-trained language model and then
evaluated under a robustness-, accuracy- and efficiency-aware metric as
environmental fitness. Experimental results show that RoSearch can improve
robustness of student models from 7%~18% up to 45.8%~47.8% on different
datasets with comparable weight compression ratio to existing distillation
methods (4.6$\times$~6.5$\times$ improvement from teacher model BERT_BASE) and
low accuracy drop. In addition, we summarize the relationship between student
architecture and robustness through statistics of searched models.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Computation and Language</subject><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotz71OwzAUhmEvDKhwAUz1DSTYObGTsFXlVwoCtZU6oejEOU4thQTZDoK7B1qmV_qGT3oYu5IizUulxDX6L_eZZlLoVICWcM7eNtOW0JvDDT-V28nzzdTOIfJtnDsaI1_97i6SibOnwPcHGvmtC9ENgxt7_uopiR7dSB2vcexn7Ik_Tx0N4YKdWRwCXf53wXb3d7v1Y1K_PDytV3WCuoCkJCNNRRkQSlVkYFutDAilC8w72ZVGUVFUtjVQaTRVngMJa4lUZktQKGDBlqfbo6_58O4d_Xfz52yOTvgBwDROkQ</recordid><startdate>20210607</startdate><enddate>20210607</enddate><creator>Guo, Xin</creator><creator>Yang, Jianlei</creator><creator>Zhou, Haoyi</creator><creator>Ye, Xucheng</creator><creator>Li, Jianxin</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20210607</creationdate><title>RoSearch: Search for Robust Student Architectures When Distilling Pre-trained Language Models</title><author>Guo, Xin ; Yang, Jianlei ; Zhou, Haoyi ; Ye, Xucheng ; Li, Jianxin</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a673-8ec1c9e23ea15723fb65c30567a4d1d8c5e779fbc396ac9443e0ffee52f835a03</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Computation and Language</topic><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Guo, Xin</creatorcontrib><creatorcontrib>Yang, Jianlei</creatorcontrib><creatorcontrib>Zhou, Haoyi</creatorcontrib><creatorcontrib>Ye, Xucheng</creatorcontrib><creatorcontrib>Li, Jianxin</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Guo, Xin</au><au>Yang, Jianlei</au><au>Zhou, Haoyi</au><au>Ye, Xucheng</au><au>Li, Jianxin</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>RoSearch: Search for Robust Student Architectures When Distilling Pre-trained Language Models</atitle><date>2021-06-07</date><risdate>2021</risdate><abstract>Pre-trained language models achieve outstanding performance in NLP tasks.
Various knowledge distillation methods have been proposed to reduce the heavy
computation and storage requirements of pre-trained language models. However,
from our observations, student models acquired by knowledge distillation suffer
from adversarial attacks, which limits their usage in security sensitive
scenarios. In order to overcome these security problems, RoSearch is proposed
as a comprehensive framework to search the student models with better
adversarial robustness when performing knowledge distillation. A directed
acyclic graph based search space is built and an evolutionary search strategy
is utilized to guide the searching approach. Each searched architecture is
trained by knowledge distillation on pre-trained language model and then
evaluated under a robustness-, accuracy- and efficiency-aware metric as
environmental fitness. Experimental results show that RoSearch can improve
robustness of student models from 7%~18% up to 45.8%~47.8% on different
datasets with comparable weight compression ratio to existing distillation
methods (4.6$\times$~6.5$\times$ improvement from teacher model BERT_BASE) and
low accuracy drop. In addition, we summarize the relationship between student
architecture and robustness through statistics of searched models.</abstract><doi>10.48550/arxiv.2106.03613</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.2106.03613 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_2106_03613 |
source | arXiv.org |
subjects | Computer Science - Artificial Intelligence Computer Science - Computation and Language Computer Science - Learning |
title | RoSearch: Search for Robust Student Architectures When Distilling Pre-trained Language Models |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-01T15%3A39%3A50IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=RoSearch:%20Search%20for%20Robust%20Student%20Architectures%20When%20Distilling%20Pre-trained%20Language%20Models&rft.au=Guo,%20Xin&rft.date=2021-06-07&rft_id=info:doi/10.48550/arxiv.2106.03613&rft_dat=%3Carxiv_GOX%3E2106_03613%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |