Activation Map Adaptation for Effective Knowledge Distillation

Model compression becomes a recent trend due to the requirement of deploying neural networks on embedded and mobile devices. Hence, both accuracy and efficiency are of critical importance. To explore a balance between them, a knowledge distillation strategy is proposed for general visual representat...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Wu, Zhiyuan, Qi, Hong, Jiang, Yu, Zhao, Minghao, Cui, Chupeng, Yang, Zongmin, Xue, Xinhui
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Computer Vision and Pattern Recognition Computer Science - Learning
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Wu, Zhiyuan Qi, Hong Jiang, Yu Zhao, Minghao Cui, Chupeng Yang, Zongmin Xue, Xinhui
description	Model compression becomes a recent trend due to the requirement of deploying neural networks on embedded and mobile devices. Hence, both accuracy and efficiency are of critical importance. To explore a balance between them, a knowledge distillation strategy is proposed for general visual representation learning. It utilizes our well-designed activation map adaptive module to replace some blocks of the teacher network, exploring the most appropriate supervisory features adaptively during the training process. Using the teacher's hidden layer output to prompt the student network to train so as to transfer effective semantic information.To verify the effectiveness of our strategy, this paper applied our method to cifar-10 dataset. Results demonstrate that the method can boost the accuracy of the student network by 0.6% with 6.5% loss reduction, and significantly improve its training speed.
doi_str_mv	10.48550/arxiv.2010.13500
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2010_13500</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2010_13500</sourcerecordid><originalsourceid>FETCH-LOGICAL-a670-4c02ab48446140b58a9b1ddd25a7216edb5e7662b37d74a3542576cc88fdd2bc3</originalsourceid><addsrcrecordid>eNotj8tqwkAYhWfThaR9AFedF4jOfaYbIUTbSlO6cR_-uZWBmIQ42Pbtq9HV4Rw-DnwILSlZCSMlWcP0m84rRi4D5ZKQBdpULqcz5DT0-BNGXHkY863GYcK7GMMVCPijH3664L8D3qZTTl03Q4_oIUJ3Ck_3LNDhdXeo38vm621fV00JSpNSOMLACiOEooJYaeDFUu89k6AZVcFbGbRSzHLttQAuBZNaOWdMvEDW8QI9325ngXac0hGmv_Yq0s4i_B9ce0Lw</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Activation Map Adaptation for Effective Knowledge Distillation</title><source>arXiv.org</source><creator>Wu, Zhiyuan ; Qi, Hong ; Jiang, Yu ; Zhao, Minghao ; Cui, Chupeng ; Yang, Zongmin ; Xue, Xinhui</creator><creatorcontrib>Wu, Zhiyuan ; Qi, Hong ; Jiang, Yu ; Zhao, Minghao ; Cui, Chupeng ; Yang, Zongmin ; Xue, Xinhui</creatorcontrib><description>Model compression becomes a recent trend due to the requirement of deploying neural networks on embedded and mobile devices. Hence, both accuracy and efficiency are of critical importance. To explore a balance between them, a knowledge distillation strategy is proposed for general visual representation learning. It utilizes our well-designed activation map adaptive module to replace some blocks of the teacher network, exploring the most appropriate supervisory features adaptively during the training process. Using the teacher's hidden layer output to prompt the student network to train so as to transfer effective semantic information.To verify the effectiveness of our strategy, this paper applied our method to cifar-10 dataset. Results demonstrate that the method can boost the accuracy of the student network by 0.6% with 6.5% loss reduction, and significantly improve its training speed.</description><identifier>DOI: 10.48550/arxiv.2010.13500</identifier><language>eng</language><subject>Computer Science - Computer Vision and Pattern Recognition ; Computer Science - Learning</subject><creationdate>2020-10</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2010.13500$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2010.13500$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Wu, Zhiyuan</creatorcontrib><creatorcontrib>Qi, Hong</creatorcontrib><creatorcontrib>Jiang, Yu</creatorcontrib><creatorcontrib>Zhao, Minghao</creatorcontrib><creatorcontrib>Cui, Chupeng</creatorcontrib><creatorcontrib>Yang, Zongmin</creatorcontrib><creatorcontrib>Xue, Xinhui</creatorcontrib><title>Activation Map Adaptation for Effective Knowledge Distillation</title><description>Model compression becomes a recent trend due to the requirement of deploying neural networks on embedded and mobile devices. Hence, both accuracy and efficiency are of critical importance. To explore a balance between them, a knowledge distillation strategy is proposed for general visual representation learning. It utilizes our well-designed activation map adaptive module to replace some blocks of the teacher network, exploring the most appropriate supervisory features adaptively during the training process. Using the teacher's hidden layer output to prompt the student network to train so as to transfer effective semantic information.To verify the effectiveness of our strategy, this paper applied our method to cifar-10 dataset. Results demonstrate that the method can boost the accuracy of the student network by 0.6% with 6.5% loss reduction, and significantly improve its training speed.</description><subject>Computer Science - Computer Vision and Pattern Recognition</subject><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj8tqwkAYhWfThaR9AFedF4jOfaYbIUTbSlO6cR_-uZWBmIQ42Pbtq9HV4Rw-DnwILSlZCSMlWcP0m84rRi4D5ZKQBdpULqcz5DT0-BNGXHkY863GYcK7GMMVCPijH3664L8D3qZTTl03Q4_oIUJ3Ck_3LNDhdXeo38vm621fV00JSpNSOMLACiOEooJYaeDFUu89k6AZVcFbGbRSzHLttQAuBZNaOWdMvEDW8QI9325ngXac0hGmv_Yq0s4i_B9ce0Lw</recordid><startdate>20201026</startdate><enddate>20201026</enddate><creator>Wu, Zhiyuan</creator><creator>Qi, Hong</creator><creator>Jiang, Yu</creator><creator>Zhao, Minghao</creator><creator>Cui, Chupeng</creator><creator>Yang, Zongmin</creator><creator>Xue, Xinhui</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20201026</creationdate><title>Activation Map Adaptation for Effective Knowledge Distillation</title><author>Wu, Zhiyuan ; Qi, Hong ; Jiang, Yu ; Zhao, Minghao ; Cui, Chupeng ; Yang, Zongmin ; Xue, Xinhui</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a670-4c02ab48446140b58a9b1ddd25a7216edb5e7662b37d74a3542576cc88fdd2bc3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Computer Science - Computer Vision and Pattern Recognition</topic><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Wu, Zhiyuan</creatorcontrib><creatorcontrib>Qi, Hong</creatorcontrib><creatorcontrib>Jiang, Yu</creatorcontrib><creatorcontrib>Zhao, Minghao</creatorcontrib><creatorcontrib>Cui, Chupeng</creatorcontrib><creatorcontrib>Yang, Zongmin</creatorcontrib><creatorcontrib>Xue, Xinhui</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Wu, Zhiyuan</au><au>Qi, Hong</au><au>Jiang, Yu</au><au>Zhao, Minghao</au><au>Cui, Chupeng</au><au>Yang, Zongmin</au><au>Xue, Xinhui</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Activation Map Adaptation for Effective Knowledge Distillation</atitle><date>2020-10-26</date><risdate>2020</risdate><abstract>Model compression becomes a recent trend due to the requirement of deploying neural networks on embedded and mobile devices. Hence, both accuracy and efficiency are of critical importance. To explore a balance between them, a knowledge distillation strategy is proposed for general visual representation learning. It utilizes our well-designed activation map adaptive module to replace some blocks of the teacher network, exploring the most appropriate supervisory features adaptively during the training process. Using the teacher's hidden layer output to prompt the student network to train so as to transfer effective semantic information.To verify the effectiveness of our strategy, this paper applied our method to cifar-10 dataset. Results demonstrate that the method can boost the accuracy of the student network by 0.6% with 6.5% loss reduction, and significantly improve its training speed.</abstract><doi>10.48550/arxiv.2010.13500</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2010.13500
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2010_13500
source	arXiv.org
subjects	Computer Science - Computer Vision and Pattern Recognition Computer Science - Learning
title	Activation Map Adaptation for Effective Knowledge Distillation
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-04T21%3A57%3A50IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Activation%20Map%20Adaptation%20for%20Effective%20Knowledge%20Distillation&rft.au=Wu,%20Zhiyuan&rft.date=2020-10-26&rft_id=info:doi/10.48550/arxiv.2010.13500&rft_dat=%3Carxiv_GOX%3E2010_13500%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true