Learning to Explore Distillability and Sparsability: A Joint Framework for Model Compression

Deep learning shows excellent performance usually at the expense of heavy computation. Recently, model compression has become a popular way of reducing the computation. Compression can be achieved using knowledge distillation or filter pruning. Knowledge distillation improves the accuracy of a light...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on pattern analysis and machine intelligence 2023-03, Vol.45 (3), p.3378-3395
Hauptverfasser: Liu, Yufan, Cao, Jiajiong, Li, Bing, Hu, Weiming, Maybank, Stephen
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 3395
container_issue 3
container_start_page 3378
container_title IEEE transactions on pattern analysis and machine intelligence
container_volume 45
creator Liu, Yufan
Cao, Jiajiong
Li, Bing
Hu, Weiming
Maybank, Stephen
description Deep learning shows excellent performance usually at the expense of heavy computation. Recently, model compression has become a popular way of reducing the computation. Compression can be achieved using knowledge distillation or filter pruning. Knowledge distillation improves the accuracy of a lightweight network, while filter pruning removes redundant architecture in a cumbersome network. They are two different ways of achieving model compression, but few methods simultaneously consider both of them. In this paper, we revisit model compression and define two attributes of a model: distillability and sparsability, which reflect how much useful knowledge can be distilled and how many pruned ratios can be obtained, respectively. Guided by our observations and considering both accuracy and model size, a dynamically distillability-and-sparsability learning framework (DDSL) is introduced for model compression. DDSL consists of teacher, student and dean. Knowledge is distilled from the teacher to guide the student. The dean controls the training process by dynamically adjusting the distillation supervision and the sparsity supervision in a meta-learning framework. An alternating direction method of multiplier (ADMM)-based knowledge distillation-with-pruning (KDP) joint optimization algorithm is proposed to train the model. Extensive experimental results show that DDSL outperforms 24 state-of-the-art methods, including both knowledge distillation and filter pruning methods.
doi_str_mv 10.1109/TPAMI.2022.3185317
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_2773455163</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9804342</ieee_id><sourcerecordid>2773455163</sourcerecordid><originalsourceid>FETCH-LOGICAL-c395t-254413d106fc669bfe7a91ddd67fff50117c351dcc23ddef615056b75fbfbac83</originalsourceid><addsrcrecordid>eNpdkMFO3DAQhq2qqCy0LwASssSFSxaPHdsxt9UCBbSISqW3SpYT25VpEgc7q8LbN9tdOPQ00sz3j2Y-hI6AzAGIOn_8tri_nVNC6ZxBxRnID2gGiqmCcaY-ohkBQYuqotU-Osj5iRAoOWGf0D7jcqJlOUM_V86kPvS_8Bjx1cvQxuTwZchjaFtThzaMr9j0Fn8fTMq7xgVe4LsY-hFfJ9O5PzH9xj4mfB-ta_EydkNyOYfYf0Z73rTZfdnVQ_Tj-upxeVOsHr7eLheromGKjwXlZQnMAhG-EULV3kmjwForpPeeEwDZMA62aSiz1nkBnHBRS-5rX5umYofobLt3SPF57fKou5AbN33Qu7jOmgqphKooKSf09D_0Ka5TP12nqZSs5BwEmyi6pZoUc07O6yGFzqRXDURv3Ot_7vXGvd65n0Inu9XrunP2PfImewKOt0Bwzr2PVTWdVVL2F6NviHo</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2773455163</pqid></control><display><type>article</type><title>Learning to Explore Distillability and Sparsability: A Joint Framework for Model Compression</title><source>IEEE</source><creator>Liu, Yufan ; Cao, Jiajiong ; Li, Bing ; Hu, Weiming ; Maybank, Stephen</creator><creatorcontrib>Liu, Yufan ; Cao, Jiajiong ; Li, Bing ; Hu, Weiming ; Maybank, Stephen</creatorcontrib><description>Deep learning shows excellent performance usually at the expense of heavy computation. Recently, model compression has become a popular way of reducing the computation. Compression can be achieved using knowledge distillation or filter pruning. Knowledge distillation improves the accuracy of a lightweight network, while filter pruning removes redundant architecture in a cumbersome network. They are two different ways of achieving model compression, but few methods simultaneously consider both of them. In this paper, we revisit model compression and define two attributes of a model: distillability and sparsability, which reflect how much useful knowledge can be distilled and how many pruned ratios can be obtained, respectively. Guided by our observations and considering both accuracy and model size, a dynamically distillability-and-sparsability learning framework (DDSL) is introduced for model compression. DDSL consists of teacher, student and dean. Knowledge is distilled from the teacher to guide the student. The dean controls the training process by dynamically adjusting the distillation supervision and the sparsity supervision in a meta-learning framework. An alternating direction method of multiplier (ADMM)-based knowledge distillation-with-pruning (KDP) joint optimization algorithm is proposed to train the model. Extensive experimental results show that DDSL outperforms 24 state-of-the-art methods, including both knowledge distillation and filter pruning methods.</description><identifier>ISSN: 0162-8828</identifier><identifier>EISSN: 1939-3539</identifier><identifier>EISSN: 2160-9292</identifier><identifier>DOI: 10.1109/TPAMI.2022.3185317</identifier><identifier>PMID: 35731774</identifier><identifier>CODEN: ITPIDJ</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>Algorithms ; Analytical models ; Computation ; Computational modeling ; Computer architecture ; Deep learning ; Distillation ; filter pruning ; Heuristic algorithms ; Knowledge ; Knowledge distillation ; Knowledge engineering ; Learning ; Machine learning ; Model accuracy ; Optimization ; structured sparsity pruning ; Teachers ; Training</subject><ispartof>IEEE transactions on pattern analysis and machine intelligence, 2023-03, Vol.45 (3), p.3378-3395</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c395t-254413d106fc669bfe7a91ddd67fff50117c351dcc23ddef615056b75fbfbac83</citedby><cites>FETCH-LOGICAL-c395t-254413d106fc669bfe7a91ddd67fff50117c351dcc23ddef615056b75fbfbac83</cites><orcidid>0000-0002-5888-6735 ; 0000-0001-8311-5820 ; 0000-0003-2113-9119 ; 0000-0002-8426-9335 ; 0000-0001-9237-8825</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9804342$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9804342$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/35731774$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Liu, Yufan</creatorcontrib><creatorcontrib>Cao, Jiajiong</creatorcontrib><creatorcontrib>Li, Bing</creatorcontrib><creatorcontrib>Hu, Weiming</creatorcontrib><creatorcontrib>Maybank, Stephen</creatorcontrib><title>Learning to Explore Distillability and Sparsability: A Joint Framework for Model Compression</title><title>IEEE transactions on pattern analysis and machine intelligence</title><addtitle>TPAMI</addtitle><addtitle>IEEE Trans Pattern Anal Mach Intell</addtitle><description>Deep learning shows excellent performance usually at the expense of heavy computation. Recently, model compression has become a popular way of reducing the computation. Compression can be achieved using knowledge distillation or filter pruning. Knowledge distillation improves the accuracy of a lightweight network, while filter pruning removes redundant architecture in a cumbersome network. They are two different ways of achieving model compression, but few methods simultaneously consider both of them. In this paper, we revisit model compression and define two attributes of a model: distillability and sparsability, which reflect how much useful knowledge can be distilled and how many pruned ratios can be obtained, respectively. Guided by our observations and considering both accuracy and model size, a dynamically distillability-and-sparsability learning framework (DDSL) is introduced for model compression. DDSL consists of teacher, student and dean. Knowledge is distilled from the teacher to guide the student. The dean controls the training process by dynamically adjusting the distillation supervision and the sparsity supervision in a meta-learning framework. An alternating direction method of multiplier (ADMM)-based knowledge distillation-with-pruning (KDP) joint optimization algorithm is proposed to train the model. Extensive experimental results show that DDSL outperforms 24 state-of-the-art methods, including both knowledge distillation and filter pruning methods.</description><subject>Algorithms</subject><subject>Analytical models</subject><subject>Computation</subject><subject>Computational modeling</subject><subject>Computer architecture</subject><subject>Deep learning</subject><subject>Distillation</subject><subject>filter pruning</subject><subject>Heuristic algorithms</subject><subject>Knowledge</subject><subject>Knowledge distillation</subject><subject>Knowledge engineering</subject><subject>Learning</subject><subject>Machine learning</subject><subject>Model accuracy</subject><subject>Optimization</subject><subject>structured sparsity pruning</subject><subject>Teachers</subject><subject>Training</subject><issn>0162-8828</issn><issn>1939-3539</issn><issn>2160-9292</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpdkMFO3DAQhq2qqCy0LwASssSFSxaPHdsxt9UCBbSISqW3SpYT25VpEgc7q8LbN9tdOPQ00sz3j2Y-hI6AzAGIOn_8tri_nVNC6ZxBxRnID2gGiqmCcaY-ohkBQYuqotU-Osj5iRAoOWGf0D7jcqJlOUM_V86kPvS_8Bjx1cvQxuTwZchjaFtThzaMr9j0Fn8fTMq7xgVe4LsY-hFfJ9O5PzH9xj4mfB-ta_EydkNyOYfYf0Z73rTZfdnVQ_Tj-upxeVOsHr7eLheromGKjwXlZQnMAhG-EULV3kmjwForpPeeEwDZMA62aSiz1nkBnHBRS-5rX5umYofobLt3SPF57fKou5AbN33Qu7jOmgqphKooKSf09D_0Ka5TP12nqZSs5BwEmyi6pZoUc07O6yGFzqRXDURv3Ot_7vXGvd65n0Inu9XrunP2PfImewKOt0Bwzr2PVTWdVVL2F6NviHo</recordid><startdate>20230301</startdate><enddate>20230301</enddate><creator>Liu, Yufan</creator><creator>Cao, Jiajiong</creator><creator>Li, Bing</creator><creator>Hu, Weiming</creator><creator>Maybank, Stephen</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0002-5888-6735</orcidid><orcidid>https://orcid.org/0000-0001-8311-5820</orcidid><orcidid>https://orcid.org/0000-0003-2113-9119</orcidid><orcidid>https://orcid.org/0000-0002-8426-9335</orcidid><orcidid>https://orcid.org/0000-0001-9237-8825</orcidid></search><sort><creationdate>20230301</creationdate><title>Learning to Explore Distillability and Sparsability: A Joint Framework for Model Compression</title><author>Liu, Yufan ; Cao, Jiajiong ; Li, Bing ; Hu, Weiming ; Maybank, Stephen</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c395t-254413d106fc669bfe7a91ddd67fff50117c351dcc23ddef615056b75fbfbac83</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Algorithms</topic><topic>Analytical models</topic><topic>Computation</topic><topic>Computational modeling</topic><topic>Computer architecture</topic><topic>Deep learning</topic><topic>Distillation</topic><topic>filter pruning</topic><topic>Heuristic algorithms</topic><topic>Knowledge</topic><topic>Knowledge distillation</topic><topic>Knowledge engineering</topic><topic>Learning</topic><topic>Machine learning</topic><topic>Model accuracy</topic><topic>Optimization</topic><topic>structured sparsity pruning</topic><topic>Teachers</topic><topic>Training</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Liu, Yufan</creatorcontrib><creatorcontrib>Cao, Jiajiong</creatorcontrib><creatorcontrib>Li, Bing</creatorcontrib><creatorcontrib>Hu, Weiming</creatorcontrib><creatorcontrib>Maybank, Stephen</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005–Present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998–Present</collection><collection>IEEE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE transactions on pattern analysis and machine intelligence</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Liu, Yufan</au><au>Cao, Jiajiong</au><au>Li, Bing</au><au>Hu, Weiming</au><au>Maybank, Stephen</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Learning to Explore Distillability and Sparsability: A Joint Framework for Model Compression</atitle><jtitle>IEEE transactions on pattern analysis and machine intelligence</jtitle><stitle>TPAMI</stitle><addtitle>IEEE Trans Pattern Anal Mach Intell</addtitle><date>2023-03-01</date><risdate>2023</risdate><volume>45</volume><issue>3</issue><spage>3378</spage><epage>3395</epage><pages>3378-3395</pages><issn>0162-8828</issn><eissn>1939-3539</eissn><eissn>2160-9292</eissn><coden>ITPIDJ</coden><abstract>Deep learning shows excellent performance usually at the expense of heavy computation. Recently, model compression has become a popular way of reducing the computation. Compression can be achieved using knowledge distillation or filter pruning. Knowledge distillation improves the accuracy of a lightweight network, while filter pruning removes redundant architecture in a cumbersome network. They are two different ways of achieving model compression, but few methods simultaneously consider both of them. In this paper, we revisit model compression and define two attributes of a model: distillability and sparsability, which reflect how much useful knowledge can be distilled and how many pruned ratios can be obtained, respectively. Guided by our observations and considering both accuracy and model size, a dynamically distillability-and-sparsability learning framework (DDSL) is introduced for model compression. DDSL consists of teacher, student and dean. Knowledge is distilled from the teacher to guide the student. The dean controls the training process by dynamically adjusting the distillation supervision and the sparsity supervision in a meta-learning framework. An alternating direction method of multiplier (ADMM)-based knowledge distillation-with-pruning (KDP) joint optimization algorithm is proposed to train the model. Extensive experimental results show that DDSL outperforms 24 state-of-the-art methods, including both knowledge distillation and filter pruning methods.</abstract><cop>United States</cop><pub>IEEE</pub><pmid>35731774</pmid><doi>10.1109/TPAMI.2022.3185317</doi><tpages>18</tpages><orcidid>https://orcid.org/0000-0002-5888-6735</orcidid><orcidid>https://orcid.org/0000-0001-8311-5820</orcidid><orcidid>https://orcid.org/0000-0003-2113-9119</orcidid><orcidid>https://orcid.org/0000-0002-8426-9335</orcidid><orcidid>https://orcid.org/0000-0001-9237-8825</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 0162-8828
ispartof IEEE transactions on pattern analysis and machine intelligence, 2023-03, Vol.45 (3), p.3378-3395
issn 0162-8828
1939-3539
2160-9292
language eng
recordid cdi_proquest_journals_2773455163
source IEEE
subjects Algorithms
Analytical models
Computation
Computational modeling
Computer architecture
Deep learning
Distillation
filter pruning
Heuristic algorithms
Knowledge
Knowledge distillation
Knowledge engineering
Learning
Machine learning
Model accuracy
Optimization
structured sparsity pruning
Teachers
Training
title Learning to Explore Distillability and Sparsability: A Joint Framework for Model Compression
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T16%3A03%3A30IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Learning%20to%20Explore%20Distillability%20and%20Sparsability:%20A%20Joint%20Framework%20for%20Model%20Compression&rft.jtitle=IEEE%20transactions%20on%20pattern%20analysis%20and%20machine%20intelligence&rft.au=Liu,%20Yufan&rft.date=2023-03-01&rft.volume=45&rft.issue=3&rft.spage=3378&rft.epage=3395&rft.pages=3378-3395&rft.issn=0162-8828&rft.eissn=1939-3539&rft.coden=ITPIDJ&rft_id=info:doi/10.1109/TPAMI.2022.3185317&rft_dat=%3Cproquest_RIE%3E2773455163%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2773455163&rft_id=info:pmid/35731774&rft_ieee_id=9804342&rfr_iscdi=true