Long Short-Term Sample Distillation

In the past decade, there has been substantial progress at training increasingly deep neural networks. Recent advances within the teacher--student training paradigm have established that information about past training updates show promise as a source of guidance during subsequent training steps. Ba...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Jiang, Liang, Wen, Zujie, Liang, Zhongping, Wang, Yafang, de Melo, Gerard, Li, Zhe, Ma, Liangzhuang, Zhang, Jiaxing, Li, Xiaolong, Qi, Yuan
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Computation and Language Computer Science - Computer Vision and Pattern Recognition
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Jiang, Liang Wen, Zujie Liang, Zhongping Wang, Yafang de Melo, Gerard Li, Zhe Ma, Liangzhuang Zhang, Jiaxing Li, Xiaolong Qi, Yuan
description	In the past decade, there has been substantial progress at training increasingly deep neural networks. Recent advances within the teacher--student training paradigm have established that information about past training updates show promise as a source of guidance during subsequent training steps. Based on this notion, in this paper, we propose Long Short-Term Sample Distillation, a novel training policy that simultaneously leverages multiple phases of the previous training process to guide the later training updates to a neural network, while efficiently proceeding in just one single generation pass. With Long Short-Term Sample Distillation, the supervision signal for each sample is decomposed into two parts: a long-term signal and a short-term one. The long-term teacher draws on snapshots from several epochs ago in order to provide steadfast guidance and to guarantee teacher--student differences, while the short-term one yields more up-to-date cues with the goal of enabling higher-quality updates. Moreover, the teachers for each sample are unique, such that, overall, the model learns from a very diverse set of teachers. Comprehensive experimental results across a range of vision and NLP tasks demonstrate the effectiveness of this new training method.
doi_str_mv	10.48550/arxiv.2003.00739
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2003_00739</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2003_00739</sourcerecordid><originalsourceid>FETCH-LOGICAL-a679-c0b1fe7c0d4355d3c04e1242ad4ca6deeee3e4813563acc54b33004b89a82d893</originalsourceid><addsrcrecordid>eNotzjsPgjAUhuEuDkb9AU6SOIMHTgtlNN4TEgfZyaEt2oSLqcTov_f6Le_25WFsGkLApRCwIPew9yACwAAgwXTI5lnXnr3TpXO9nxvXeCdqrrXx1vbW27qm3nbtmA0qqm9m8u-I5dtNvtr72XF3WC0zn-Ik9RWUYWUSBZqjEBoVcBNGPCLNFcXavIeGyxBFjKSU4CUiAC9lSjLSMsURm_1uv8ri6mxD7ll8tMVXiy9qNTkA</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Long Short-Term Sample Distillation</title><source>arXiv.org</source><creator>Jiang, Liang ; Wen, Zujie ; Liang, Zhongping ; Wang, Yafang ; de Melo, Gerard ; Li, Zhe ; Ma, Liangzhuang ; Zhang, Jiaxing ; Li, Xiaolong ; Qi, Yuan</creator><creatorcontrib>Jiang, Liang ; Wen, Zujie ; Liang, Zhongping ; Wang, Yafang ; de Melo, Gerard ; Li, Zhe ; Ma, Liangzhuang ; Zhang, Jiaxing ; Li, Xiaolong ; Qi, Yuan</creatorcontrib><description>In the past decade, there has been substantial progress at training increasingly deep neural networks. Recent advances within the teacher--student training paradigm have established that information about past training updates show promise as a source of guidance during subsequent training steps. Based on this notion, in this paper, we propose Long Short-Term Sample Distillation, a novel training policy that simultaneously leverages multiple phases of the previous training process to guide the later training updates to a neural network, while efficiently proceeding in just one single generation pass. With Long Short-Term Sample Distillation, the supervision signal for each sample is decomposed into two parts: a long-term signal and a short-term one. The long-term teacher draws on snapshots from several epochs ago in order to provide steadfast guidance and to guarantee teacher--student differences, while the short-term one yields more up-to-date cues with the goal of enabling higher-quality updates. Moreover, the teachers for each sample are unique, such that, overall, the model learns from a very diverse set of teachers. Comprehensive experimental results across a range of vision and NLP tasks demonstrate the effectiveness of this new training method.</description><identifier>DOI: 10.48550/arxiv.2003.00739</identifier><language>eng</language><subject>Computer Science - Computation and Language ; Computer Science - Computer Vision and Pattern Recognition</subject><creationdate>2020-03</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2003.00739$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2003.00739$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Jiang, Liang</creatorcontrib><creatorcontrib>Wen, Zujie</creatorcontrib><creatorcontrib>Liang, Zhongping</creatorcontrib><creatorcontrib>Wang, Yafang</creatorcontrib><creatorcontrib>de Melo, Gerard</creatorcontrib><creatorcontrib>Li, Zhe</creatorcontrib><creatorcontrib>Ma, Liangzhuang</creatorcontrib><creatorcontrib>Zhang, Jiaxing</creatorcontrib><creatorcontrib>Li, Xiaolong</creatorcontrib><creatorcontrib>Qi, Yuan</creatorcontrib><title>Long Short-Term Sample Distillation</title><description>In the past decade, there has been substantial progress at training increasingly deep neural networks. Recent advances within the teacher--student training paradigm have established that information about past training updates show promise as a source of guidance during subsequent training steps. Based on this notion, in this paper, we propose Long Short-Term Sample Distillation, a novel training policy that simultaneously leverages multiple phases of the previous training process to guide the later training updates to a neural network, while efficiently proceeding in just one single generation pass. With Long Short-Term Sample Distillation, the supervision signal for each sample is decomposed into two parts: a long-term signal and a short-term one. The long-term teacher draws on snapshots from several epochs ago in order to provide steadfast guidance and to guarantee teacher--student differences, while the short-term one yields more up-to-date cues with the goal of enabling higher-quality updates. Moreover, the teachers for each sample are unique, such that, overall, the model learns from a very diverse set of teachers. Comprehensive experimental results across a range of vision and NLP tasks demonstrate the effectiveness of this new training method.</description><subject>Computer Science - Computation and Language</subject><subject>Computer Science - Computer Vision and Pattern Recognition</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotzjsPgjAUhuEuDkb9AU6SOIMHTgtlNN4TEgfZyaEt2oSLqcTov_f6Le_25WFsGkLApRCwIPew9yACwAAgwXTI5lnXnr3TpXO9nxvXeCdqrrXx1vbW27qm3nbtmA0qqm9m8u-I5dtNvtr72XF3WC0zn-Ik9RWUYWUSBZqjEBoVcBNGPCLNFcXavIeGyxBFjKSU4CUiAC9lSjLSMsURm_1uv8ri6mxD7ll8tMVXiy9qNTkA</recordid><startdate>20200302</startdate><enddate>20200302</enddate><creator>Jiang, Liang</creator><creator>Wen, Zujie</creator><creator>Liang, Zhongping</creator><creator>Wang, Yafang</creator><creator>de Melo, Gerard</creator><creator>Li, Zhe</creator><creator>Ma, Liangzhuang</creator><creator>Zhang, Jiaxing</creator><creator>Li, Xiaolong</creator><creator>Qi, Yuan</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20200302</creationdate><title>Long Short-Term Sample Distillation</title><author>Jiang, Liang ; Wen, Zujie ; Liang, Zhongping ; Wang, Yafang ; de Melo, Gerard ; Li, Zhe ; Ma, Liangzhuang ; Zhang, Jiaxing ; Li, Xiaolong ; Qi, Yuan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a679-c0b1fe7c0d4355d3c04e1242ad4ca6deeee3e4813563acc54b33004b89a82d893</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Computer Science - Computation and Language</topic><topic>Computer Science - Computer Vision and Pattern Recognition</topic><toplevel>online_resources</toplevel><creatorcontrib>Jiang, Liang</creatorcontrib><creatorcontrib>Wen, Zujie</creatorcontrib><creatorcontrib>Liang, Zhongping</creatorcontrib><creatorcontrib>Wang, Yafang</creatorcontrib><creatorcontrib>de Melo, Gerard</creatorcontrib><creatorcontrib>Li, Zhe</creatorcontrib><creatorcontrib>Ma, Liangzhuang</creatorcontrib><creatorcontrib>Zhang, Jiaxing</creatorcontrib><creatorcontrib>Li, Xiaolong</creatorcontrib><creatorcontrib>Qi, Yuan</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Jiang, Liang</au><au>Wen, Zujie</au><au>Liang, Zhongping</au><au>Wang, Yafang</au><au>de Melo, Gerard</au><au>Li, Zhe</au><au>Ma, Liangzhuang</au><au>Zhang, Jiaxing</au><au>Li, Xiaolong</au><au>Qi, Yuan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Long Short-Term Sample Distillation</atitle><date>2020-03-02</date><risdate>2020</risdate><abstract>In the past decade, there has been substantial progress at training increasingly deep neural networks. Recent advances within the teacher--student training paradigm have established that information about past training updates show promise as a source of guidance during subsequent training steps. Based on this notion, in this paper, we propose Long Short-Term Sample Distillation, a novel training policy that simultaneously leverages multiple phases of the previous training process to guide the later training updates to a neural network, while efficiently proceeding in just one single generation pass. With Long Short-Term Sample Distillation, the supervision signal for each sample is decomposed into two parts: a long-term signal and a short-term one. The long-term teacher draws on snapshots from several epochs ago in order to provide steadfast guidance and to guarantee teacher--student differences, while the short-term one yields more up-to-date cues with the goal of enabling higher-quality updates. Moreover, the teachers for each sample are unique, such that, overall, the model learns from a very diverse set of teachers. Comprehensive experimental results across a range of vision and NLP tasks demonstrate the effectiveness of this new training method.</abstract><doi>10.48550/arxiv.2003.00739</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2003.00739
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2003_00739
source	arXiv.org
subjects	Computer Science - Computation and Language Computer Science - Computer Vision and Pattern Recognition
title	Long Short-Term Sample Distillation
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T16%3A18%3A26IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Long%20Short-Term%20Sample%20Distillation&rft.au=Jiang,%20Liang&rft.date=2020-03-02&rft_id=info:doi/10.48550/arxiv.2003.00739&rft_dat=%3Carxiv_GOX%3E2003_00739%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true