Fractional Skipping: Towards Finer-Grained Dynamic CNN Inference

While increasingly deep networks are still in general desired for achieving state-of-the-art performance, for many specific inputs a simpler network might already suffice. Existing works exploited this observation by learning to skip convolutional layers in an input-dependent manner. However, we arg...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2020-01
Hauptverfasser:	Shen, Jianghao, Fu, Yonggan, Wang, Yue, Xu, Pengfei, Wang, Zhangyang, Lin, Yingyan
Format:	Artikel
Sprache:	eng
Schlagworte:	Computational efficiency Computing costs Inference Measurement Model accuracy Source code
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Shen, Jianghao Fu, Yonggan Wang, Yue Xu, Pengfei Wang, Zhangyang Lin, Yingyan
description	While increasingly deep networks are still in general desired for achieving state-of-the-art performance, for many specific inputs a simpler network might already suffice. Existing works exploited this observation by learning to skip convolutional layers in an input-dependent manner. However, we argue their binary decision scheme, i.e., either fully executing or completely bypassing one layer for a specific input, can be enhanced by introducing finer-grained, "softer" decisions. We therefore propose a Dynamic Fractional Skipping (DFS) framework. The core idea of DFS is to hypothesize layer-wise quantization (to different bitwidths) as intermediate "soft" choices to be made between fully utilizing and skipping a layer. For each input, DFS dynamically assigns a bitwidth to both weights and activations of each layer, where fully executing and skipping could be viewed as two "extremes" (i.e., full bitwidth and zero bitwidth). In this way, DFS can "fractionally" exploit a layer's expressive power during input-adaptive inference, enabling finer-grained accuracy-computational cost trade-offs. It presents a unified view to link input-adaptive layer skipping and input-adaptive hybrid quantization. Extensive experimental results demonstrate the superior tradeoff between computational cost and model expressive power (accuracy) achieved by DFS. More visualizations also indicate a smooth and consistent transition in the DFS behaviors, especially the learned choices between layer skipping and different quantizations when the total computational budgets vary, validating our hypothesis that layer quantization could be viewed as intermediate variants of layer skipping. Our source code and supplementary material are available at \link{https://github.com/Torment123/DFS}.
format	Article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2333771727</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2333771727</sourcerecordid><originalsourceid>FETCH-proquest_journals_23337717273</originalsourceid><addsrcrecordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mRwcCtKTC7JzM9LzFEIzs4sKMjMS7dSCMkvTyxKKVZwy8xLLdJ1L0oE0ikKLpV5ibmZyQrOfn4KnnlpqUWpecmpPAysaYk5xam8UJqbQdnNNcTZQ7egKL-wNLW4JD4rv7QIaHxxPNAZxubmhuZG5sbEqQIA56I4TQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2333771727</pqid></control><display><type>article</type><title>Fractional Skipping: Towards Finer-Grained Dynamic CNN Inference</title><source>Freely Accessible Journals</source><creator>Shen, Jianghao ; Fu, Yonggan ; Wang, Yue ; Xu, Pengfei ; Wang, Zhangyang ; Lin, Yingyan</creator><creatorcontrib>Shen, Jianghao ; Fu, Yonggan ; Wang, Yue ; Xu, Pengfei ; Wang, Zhangyang ; Lin, Yingyan</creatorcontrib><description>While increasingly deep networks are still in general desired for achieving state-of-the-art performance, for many specific inputs a simpler network might already suffice. Existing works exploited this observation by learning to skip convolutional layers in an input-dependent manner. However, we argue their binary decision scheme, i.e., either fully executing or completely bypassing one layer for a specific input, can be enhanced by introducing finer-grained, "softer" decisions. We therefore propose a Dynamic Fractional Skipping (DFS) framework. The core idea of DFS is to hypothesize layer-wise quantization (to different bitwidths) as intermediate "soft" choices to be made between fully utilizing and skipping a layer. For each input, DFS dynamically assigns a bitwidth to both weights and activations of each layer, where fully executing and skipping could be viewed as two "extremes" (i.e., full bitwidth and zero bitwidth). In this way, DFS can "fractionally" exploit a layer's expressive power during input-adaptive inference, enabling finer-grained accuracy-computational cost trade-offs. It presents a unified view to link input-adaptive layer skipping and input-adaptive hybrid quantization. Extensive experimental results demonstrate the superior tradeoff between computational cost and model expressive power (accuracy) achieved by DFS. More visualizations also indicate a smooth and consistent transition in the DFS behaviors, especially the learned choices between layer skipping and different quantizations when the total computational budgets vary, validating our hypothesis that layer quantization could be viewed as intermediate variants of layer skipping. Our source code and supplementary material are available at \link{https://github.com/Torment123/DFS}.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Computational efficiency ; Computing costs ; Inference ; Measurement ; Model accuracy ; Source code</subject><ispartof>arXiv.org, 2020-01</ispartof><rights>2020. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>782,786</link.rule.ids></links><search><creatorcontrib>Shen, Jianghao</creatorcontrib><creatorcontrib>Fu, Yonggan</creatorcontrib><creatorcontrib>Wang, Yue</creatorcontrib><creatorcontrib>Xu, Pengfei</creatorcontrib><creatorcontrib>Wang, Zhangyang</creatorcontrib><creatorcontrib>Lin, Yingyan</creatorcontrib><title>Fractional Skipping: Towards Finer-Grained Dynamic CNN Inference</title><title>arXiv.org</title><description>While increasingly deep networks are still in general desired for achieving state-of-the-art performance, for many specific inputs a simpler network might already suffice. Existing works exploited this observation by learning to skip convolutional layers in an input-dependent manner. However, we argue their binary decision scheme, i.e., either fully executing or completely bypassing one layer for a specific input, can be enhanced by introducing finer-grained, "softer" decisions. We therefore propose a Dynamic Fractional Skipping (DFS) framework. The core idea of DFS is to hypothesize layer-wise quantization (to different bitwidths) as intermediate "soft" choices to be made between fully utilizing and skipping a layer. For each input, DFS dynamically assigns a bitwidth to both weights and activations of each layer, where fully executing and skipping could be viewed as two "extremes" (i.e., full bitwidth and zero bitwidth). In this way, DFS can "fractionally" exploit a layer's expressive power during input-adaptive inference, enabling finer-grained accuracy-computational cost trade-offs. It presents a unified view to link input-adaptive layer skipping and input-adaptive hybrid quantization. Extensive experimental results demonstrate the superior tradeoff between computational cost and model expressive power (accuracy) achieved by DFS. More visualizations also indicate a smooth and consistent transition in the DFS behaviors, especially the learned choices between layer skipping and different quantizations when the total computational budgets vary, validating our hypothesis that layer quantization could be viewed as intermediate variants of layer skipping. Our source code and supplementary material are available at \link{https://github.com/Torment123/DFS}.</description><subject>Computational efficiency</subject><subject>Computing costs</subject><subject>Inference</subject><subject>Measurement</subject><subject>Model accuracy</subject><subject>Source code</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mRwcCtKTC7JzM9LzFEIzs4sKMjMS7dSCMkvTyxKKVZwy8xLLdJ1L0oE0ikKLpV5ibmZyQrOfn4KnnlpqUWpecmpPAysaYk5xam8UJqbQdnNNcTZQ7egKL-wNLW4JD4rv7QIaHxxPNAZxubmhuZG5sbEqQIA56I4TQ</recordid><startdate>20200103</startdate><enddate>20200103</enddate><creator>Shen, Jianghao</creator><creator>Fu, Yonggan</creator><creator>Wang, Yue</creator><creator>Xu, Pengfei</creator><creator>Wang, Zhangyang</creator><creator>Lin, Yingyan</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20200103</creationdate><title>Fractional Skipping: Towards Finer-Grained Dynamic CNN Inference</title><author>Shen, Jianghao ; Fu, Yonggan ; Wang, Yue ; Xu, Pengfei ; Wang, Zhangyang ; Lin, Yingyan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_23337717273</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Computational efficiency</topic><topic>Computing costs</topic><topic>Inference</topic><topic>Measurement</topic><topic>Model accuracy</topic><topic>Source code</topic><toplevel>online_resources</toplevel><creatorcontrib>Shen, Jianghao</creatorcontrib><creatorcontrib>Fu, Yonggan</creatorcontrib><creatorcontrib>Wang, Yue</creatorcontrib><creatorcontrib>Xu, Pengfei</creatorcontrib><creatorcontrib>Wang, Zhangyang</creatorcontrib><creatorcontrib>Lin, Yingyan</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Access via ProQuest (Open Access)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Shen, Jianghao</au><au>Fu, Yonggan</au><au>Wang, Yue</au><au>Xu, Pengfei</au><au>Wang, Zhangyang</au><au>Lin, Yingyan</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Fractional Skipping: Towards Finer-Grained Dynamic CNN Inference</atitle><jtitle>arXiv.org</jtitle><date>2020-01-03</date><risdate>2020</risdate><eissn>2331-8422</eissn><abstract>While increasingly deep networks are still in general desired for achieving state-of-the-art performance, for many specific inputs a simpler network might already suffice. Existing works exploited this observation by learning to skip convolutional layers in an input-dependent manner. However, we argue their binary decision scheme, i.e., either fully executing or completely bypassing one layer for a specific input, can be enhanced by introducing finer-grained, "softer" decisions. We therefore propose a Dynamic Fractional Skipping (DFS) framework. The core idea of DFS is to hypothesize layer-wise quantization (to different bitwidths) as intermediate "soft" choices to be made between fully utilizing and skipping a layer. For each input, DFS dynamically assigns a bitwidth to both weights and activations of each layer, where fully executing and skipping could be viewed as two "extremes" (i.e., full bitwidth and zero bitwidth). In this way, DFS can "fractionally" exploit a layer's expressive power during input-adaptive inference, enabling finer-grained accuracy-computational cost trade-offs. It presents a unified view to link input-adaptive layer skipping and input-adaptive hybrid quantization. Extensive experimental results demonstrate the superior tradeoff between computational cost and model expressive power (accuracy) achieved by DFS. More visualizations also indicate a smooth and consistent transition in the DFS behaviors, especially the learned choices between layer skipping and different quantizations when the total computational budgets vary, validating our hypothesis that layer quantization could be viewed as intermediate variants of layer skipping. Our source code and supplementary material are available at \link{https://github.com/Torment123/DFS}.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2020-01
issn	2331-8422
language	eng
recordid	cdi_proquest_journals_2333771727
source	Freely Accessible Journals
subjects	Computational efficiency Computing costs Inference Measurement Model accuracy Source code
title	Fractional Skipping: Towards Finer-Grained Dynamic CNN Inference
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-08T01%3A20%3A37IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Fractional%20Skipping:%20Towards%20Finer-Grained%20Dynamic%20CNN%20Inference&rft.jtitle=arXiv.org&rft.au=Shen,%20Jianghao&rft.date=2020-01-03&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2333771727%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2333771727&rft_id=info:pmid/&rfr_iscdi=true