Activation Map Adaptation for Effective Knowledge Distillation

Model compression becomes a recent trend due to the requirement of deploying neural networks on embedded and mobile devices. Hence, both accuracy and efficiency are of critical importance. To explore a balance between them, a knowledge distillation strategy is proposed for general visual representat...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Wu, Zhiyuan, Qi, Hong, Jiang, Yu, Zhao, Minghao, Cui, Chupeng, Yang, Zongmin, Xue, Xinhui
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Wu, Zhiyuan
Qi, Hong
Jiang, Yu
Zhao, Minghao
Cui, Chupeng
Yang, Zongmin
Xue, Xinhui
description Model compression becomes a recent trend due to the requirement of deploying neural networks on embedded and mobile devices. Hence, both accuracy and efficiency are of critical importance. To explore a balance between them, a knowledge distillation strategy is proposed for general visual representation learning. It utilizes our well-designed activation map adaptive module to replace some blocks of the teacher network, exploring the most appropriate supervisory features adaptively during the training process. Using the teacher's hidden layer output to prompt the student network to train so as to transfer effective semantic information.To verify the effectiveness of our strategy, this paper applied our method to cifar-10 dataset. Results demonstrate that the method can boost the accuracy of the student network by 0.6% with 6.5% loss reduction, and significantly improve its training speed.
doi_str_mv 10.48550/arxiv.2010.13500
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2010_13500</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2010_13500</sourcerecordid><originalsourceid>FETCH-LOGICAL-a670-4c02ab48446140b58a9b1ddd25a7216edb5e7662b37d74a3542576cc88fdd2bc3</originalsourceid><addsrcrecordid>eNotj8tqwkAYhWfThaR9AFedF4jOfaYbIUTbSlO6cR_-uZWBmIQ42Pbtq9HV4Rw-DnwILSlZCSMlWcP0m84rRi4D5ZKQBdpULqcz5DT0-BNGXHkY863GYcK7GMMVCPijH3664L8D3qZTTl03Q4_oIUJ3Ck_3LNDhdXeo38vm621fV00JSpNSOMLACiOEooJYaeDFUu89k6AZVcFbGbRSzHLttQAuBZNaOWdMvEDW8QI9325ngXac0hGmv_Yq0s4i_B9ce0Lw</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Activation Map Adaptation for Effective Knowledge Distillation</title><source>arXiv.org</source><creator>Wu, Zhiyuan ; Qi, Hong ; Jiang, Yu ; Zhao, Minghao ; Cui, Chupeng ; Yang, Zongmin ; Xue, Xinhui</creator><creatorcontrib>Wu, Zhiyuan ; Qi, Hong ; Jiang, Yu ; Zhao, Minghao ; Cui, Chupeng ; Yang, Zongmin ; Xue, Xinhui</creatorcontrib><description>Model compression becomes a recent trend due to the requirement of deploying neural networks on embedded and mobile devices. Hence, both accuracy and efficiency are of critical importance. To explore a balance between them, a knowledge distillation strategy is proposed for general visual representation learning. It utilizes our well-designed activation map adaptive module to replace some blocks of the teacher network, exploring the most appropriate supervisory features adaptively during the training process. Using the teacher's hidden layer output to prompt the student network to train so as to transfer effective semantic information.To verify the effectiveness of our strategy, this paper applied our method to cifar-10 dataset. Results demonstrate that the method can boost the accuracy of the student network by 0.6% with 6.5% loss reduction, and significantly improve its training speed.</description><identifier>DOI: 10.48550/arxiv.2010.13500</identifier><language>eng</language><subject>Computer Science - Computer Vision and Pattern Recognition ; Computer Science - Learning</subject><creationdate>2020-10</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2010.13500$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2010.13500$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Wu, Zhiyuan</creatorcontrib><creatorcontrib>Qi, Hong</creatorcontrib><creatorcontrib>Jiang, Yu</creatorcontrib><creatorcontrib>Zhao, Minghao</creatorcontrib><creatorcontrib>Cui, Chupeng</creatorcontrib><creatorcontrib>Yang, Zongmin</creatorcontrib><creatorcontrib>Xue, Xinhui</creatorcontrib><title>Activation Map Adaptation for Effective Knowledge Distillation</title><description>Model compression becomes a recent trend due to the requirement of deploying neural networks on embedded and mobile devices. Hence, both accuracy and efficiency are of critical importance. To explore a balance between them, a knowledge distillation strategy is proposed for general visual representation learning. It utilizes our well-designed activation map adaptive module to replace some blocks of the teacher network, exploring the most appropriate supervisory features adaptively during the training process. Using the teacher's hidden layer output to prompt the student network to train so as to transfer effective semantic information.To verify the effectiveness of our strategy, this paper applied our method to cifar-10 dataset. Results demonstrate that the method can boost the accuracy of the student network by 0.6% with 6.5% loss reduction, and significantly improve its training speed.</description><subject>Computer Science - Computer Vision and Pattern Recognition</subject><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj8tqwkAYhWfThaR9AFedF4jOfaYbIUTbSlO6cR_-uZWBmIQ42Pbtq9HV4Rw-DnwILSlZCSMlWcP0m84rRi4D5ZKQBdpULqcz5DT0-BNGXHkY863GYcK7GMMVCPijH3664L8D3qZTTl03Q4_oIUJ3Ck_3LNDhdXeo38vm621fV00JSpNSOMLACiOEooJYaeDFUu89k6AZVcFbGbRSzHLttQAuBZNaOWdMvEDW8QI9325ngXac0hGmv_Yq0s4i_B9ce0Lw</recordid><startdate>20201026</startdate><enddate>20201026</enddate><creator>Wu, Zhiyuan</creator><creator>Qi, Hong</creator><creator>Jiang, Yu</creator><creator>Zhao, Minghao</creator><creator>Cui, Chupeng</creator><creator>Yang, Zongmin</creator><creator>Xue, Xinhui</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20201026</creationdate><title>Activation Map Adaptation for Effective Knowledge Distillation</title><author>Wu, Zhiyuan ; Qi, Hong ; Jiang, Yu ; Zhao, Minghao ; Cui, Chupeng ; Yang, Zongmin ; Xue, Xinhui</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a670-4c02ab48446140b58a9b1ddd25a7216edb5e7662b37d74a3542576cc88fdd2bc3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Computer Science - Computer Vision and Pattern Recognition</topic><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Wu, Zhiyuan</creatorcontrib><creatorcontrib>Qi, Hong</creatorcontrib><creatorcontrib>Jiang, Yu</creatorcontrib><creatorcontrib>Zhao, Minghao</creatorcontrib><creatorcontrib>Cui, Chupeng</creatorcontrib><creatorcontrib>Yang, Zongmin</creatorcontrib><creatorcontrib>Xue, Xinhui</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Wu, Zhiyuan</au><au>Qi, Hong</au><au>Jiang, Yu</au><au>Zhao, Minghao</au><au>Cui, Chupeng</au><au>Yang, Zongmin</au><au>Xue, Xinhui</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Activation Map Adaptation for Effective Knowledge Distillation</atitle><date>2020-10-26</date><risdate>2020</risdate><abstract>Model compression becomes a recent trend due to the requirement of deploying neural networks on embedded and mobile devices. Hence, both accuracy and efficiency are of critical importance. To explore a balance between them, a knowledge distillation strategy is proposed for general visual representation learning. It utilizes our well-designed activation map adaptive module to replace some blocks of the teacher network, exploring the most appropriate supervisory features adaptively during the training process. Using the teacher's hidden layer output to prompt the student network to train so as to transfer effective semantic information.To verify the effectiveness of our strategy, this paper applied our method to cifar-10 dataset. Results demonstrate that the method can boost the accuracy of the student network by 0.6% with 6.5% loss reduction, and significantly improve its training speed.</abstract><doi>10.48550/arxiv.2010.13500</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2010.13500
ispartof
issn
language eng
recordid cdi_arxiv_primary_2010_13500
source arXiv.org
subjects Computer Science - Computer Vision and Pattern Recognition
Computer Science - Learning
title Activation Map Adaptation for Effective Knowledge Distillation
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-04T21%3A57%3A50IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Activation%20Map%20Adaptation%20for%20Effective%20Knowledge%20Distillation&rft.au=Wu,%20Zhiyuan&rft.date=2020-10-26&rft_id=info:doi/10.48550/arxiv.2010.13500&rft_dat=%3Carxiv_GOX%3E2010_13500%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true