Activation Map Adaptation for Effective Knowledge Distillation
Model compression becomes a recent trend due to the requirement of deploying neural networks on embedded and mobile devices. Hence, both accuracy and efficiency are of critical importance. To explore a balance between them, a knowledge distillation strategy is proposed for general visual representat...
Gespeichert in:
Hauptverfasser: | , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Wu, Zhiyuan Qi, Hong Jiang, Yu Zhao, Minghao Cui, Chupeng Yang, Zongmin Xue, Xinhui |
description | Model compression becomes a recent trend due to the requirement of deploying
neural networks on embedded and mobile devices. Hence, both accuracy and
efficiency are of critical importance. To explore a balance between them, a
knowledge distillation strategy is proposed for general visual representation
learning. It utilizes our well-designed activation map adaptive module to
replace some blocks of the teacher network, exploring the most appropriate
supervisory features adaptively during the training process. Using the
teacher's hidden layer output to prompt the student network to train so as to
transfer effective semantic information.To verify the effectiveness of our
strategy, this paper applied our method to cifar-10 dataset. Results
demonstrate that the method can boost the accuracy of the student network by
0.6% with 6.5% loss reduction, and significantly improve its training speed. |
doi_str_mv | 10.48550/arxiv.2010.13500 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2010_13500</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2010_13500</sourcerecordid><originalsourceid>FETCH-LOGICAL-a670-4c02ab48446140b58a9b1ddd25a7216edb5e7662b37d74a3542576cc88fdd2bc3</originalsourceid><addsrcrecordid>eNotj8tqwkAYhWfThaR9AFedF4jOfaYbIUTbSlO6cR_-uZWBmIQ42Pbtq9HV4Rw-DnwILSlZCSMlWcP0m84rRi4D5ZKQBdpULqcz5DT0-BNGXHkY863GYcK7GMMVCPijH3664L8D3qZTTl03Q4_oIUJ3Ck_3LNDhdXeo38vm621fV00JSpNSOMLACiOEooJYaeDFUu89k6AZVcFbGbRSzHLttQAuBZNaOWdMvEDW8QI9325ngXac0hGmv_Yq0s4i_B9ce0Lw</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Activation Map Adaptation for Effective Knowledge Distillation</title><source>arXiv.org</source><creator>Wu, Zhiyuan ; Qi, Hong ; Jiang, Yu ; Zhao, Minghao ; Cui, Chupeng ; Yang, Zongmin ; Xue, Xinhui</creator><creatorcontrib>Wu, Zhiyuan ; Qi, Hong ; Jiang, Yu ; Zhao, Minghao ; Cui, Chupeng ; Yang, Zongmin ; Xue, Xinhui</creatorcontrib><description>Model compression becomes a recent trend due to the requirement of deploying
neural networks on embedded and mobile devices. Hence, both accuracy and
efficiency are of critical importance. To explore a balance between them, a
knowledge distillation strategy is proposed for general visual representation
learning. It utilizes our well-designed activation map adaptive module to
replace some blocks of the teacher network, exploring the most appropriate
supervisory features adaptively during the training process. Using the
teacher's hidden layer output to prompt the student network to train so as to
transfer effective semantic information.To verify the effectiveness of our
strategy, this paper applied our method to cifar-10 dataset. Results
demonstrate that the method can boost the accuracy of the student network by
0.6% with 6.5% loss reduction, and significantly improve its training speed.</description><identifier>DOI: 10.48550/arxiv.2010.13500</identifier><language>eng</language><subject>Computer Science - Computer Vision and Pattern Recognition ; Computer Science - Learning</subject><creationdate>2020-10</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2010.13500$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2010.13500$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Wu, Zhiyuan</creatorcontrib><creatorcontrib>Qi, Hong</creatorcontrib><creatorcontrib>Jiang, Yu</creatorcontrib><creatorcontrib>Zhao, Minghao</creatorcontrib><creatorcontrib>Cui, Chupeng</creatorcontrib><creatorcontrib>Yang, Zongmin</creatorcontrib><creatorcontrib>Xue, Xinhui</creatorcontrib><title>Activation Map Adaptation for Effective Knowledge Distillation</title><description>Model compression becomes a recent trend due to the requirement of deploying
neural networks on embedded and mobile devices. Hence, both accuracy and
efficiency are of critical importance. To explore a balance between them, a
knowledge distillation strategy is proposed for general visual representation
learning. It utilizes our well-designed activation map adaptive module to
replace some blocks of the teacher network, exploring the most appropriate
supervisory features adaptively during the training process. Using the
teacher's hidden layer output to prompt the student network to train so as to
transfer effective semantic information.To verify the effectiveness of our
strategy, this paper applied our method to cifar-10 dataset. Results
demonstrate that the method can boost the accuracy of the student network by
0.6% with 6.5% loss reduction, and significantly improve its training speed.</description><subject>Computer Science - Computer Vision and Pattern Recognition</subject><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj8tqwkAYhWfThaR9AFedF4jOfaYbIUTbSlO6cR_-uZWBmIQ42Pbtq9HV4Rw-DnwILSlZCSMlWcP0m84rRi4D5ZKQBdpULqcz5DT0-BNGXHkY863GYcK7GMMVCPijH3664L8D3qZTTl03Q4_oIUJ3Ck_3LNDhdXeo38vm621fV00JSpNSOMLACiOEooJYaeDFUu89k6AZVcFbGbRSzHLttQAuBZNaOWdMvEDW8QI9325ngXac0hGmv_Yq0s4i_B9ce0Lw</recordid><startdate>20201026</startdate><enddate>20201026</enddate><creator>Wu, Zhiyuan</creator><creator>Qi, Hong</creator><creator>Jiang, Yu</creator><creator>Zhao, Minghao</creator><creator>Cui, Chupeng</creator><creator>Yang, Zongmin</creator><creator>Xue, Xinhui</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20201026</creationdate><title>Activation Map Adaptation for Effective Knowledge Distillation</title><author>Wu, Zhiyuan ; Qi, Hong ; Jiang, Yu ; Zhao, Minghao ; Cui, Chupeng ; Yang, Zongmin ; Xue, Xinhui</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a670-4c02ab48446140b58a9b1ddd25a7216edb5e7662b37d74a3542576cc88fdd2bc3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Computer Science - Computer Vision and Pattern Recognition</topic><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Wu, Zhiyuan</creatorcontrib><creatorcontrib>Qi, Hong</creatorcontrib><creatorcontrib>Jiang, Yu</creatorcontrib><creatorcontrib>Zhao, Minghao</creatorcontrib><creatorcontrib>Cui, Chupeng</creatorcontrib><creatorcontrib>Yang, Zongmin</creatorcontrib><creatorcontrib>Xue, Xinhui</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Wu, Zhiyuan</au><au>Qi, Hong</au><au>Jiang, Yu</au><au>Zhao, Minghao</au><au>Cui, Chupeng</au><au>Yang, Zongmin</au><au>Xue, Xinhui</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Activation Map Adaptation for Effective Knowledge Distillation</atitle><date>2020-10-26</date><risdate>2020</risdate><abstract>Model compression becomes a recent trend due to the requirement of deploying
neural networks on embedded and mobile devices. Hence, both accuracy and
efficiency are of critical importance. To explore a balance between them, a
knowledge distillation strategy is proposed for general visual representation
learning. It utilizes our well-designed activation map adaptive module to
replace some blocks of the teacher network, exploring the most appropriate
supervisory features adaptively during the training process. Using the
teacher's hidden layer output to prompt the student network to train so as to
transfer effective semantic information.To verify the effectiveness of our
strategy, this paper applied our method to cifar-10 dataset. Results
demonstrate that the method can boost the accuracy of the student network by
0.6% with 6.5% loss reduction, and significantly improve its training speed.</abstract><doi>10.48550/arxiv.2010.13500</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.2010.13500 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_2010_13500 |
source | arXiv.org |
subjects | Computer Science - Computer Vision and Pattern Recognition Computer Science - Learning |
title | Activation Map Adaptation for Effective Knowledge Distillation |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-04T21%3A57%3A50IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Activation%20Map%20Adaptation%20for%20Effective%20Knowledge%20Distillation&rft.au=Wu,%20Zhiyuan&rft.date=2020-10-26&rft_id=info:doi/10.48550/arxiv.2010.13500&rft_dat=%3Carxiv_GOX%3E2010_13500%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |