MVQ:Towards Efficient DNN Compression and Acceleration with Masked Vector Quantization

Vector quantization(VQ) is a hardware-friendly DNN compression method that can reduce the storage cost and weight-loading datawidth of hardware accelerators. However, conventional VQ techniques lead to significant accuracy loss because the important weights are not well preserved. To tackle this pro...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2024-12
Hauptverfasser:	Li, Shuaiting, Wang, Chengxuan, Deng, Juncan, Wang, Zeyu, Ye, Zewen, Wang, Zongsheng, Shen, Haibin, Huang, Kejie
Format:	Artikel
Sprache:	eng
Schlagworte:	Accelerators Algorithms Arrays Clustering Codes Compression ratio Computer Science - Computer Vision and Pattern Recognition Computer Science - Hardware Architecture Energy efficiency Hardware Image classification Image compression Image segmentation Object recognition
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Li, Shuaiting Wang, Chengxuan Deng, Juncan Wang, Zeyu Ye, Zewen Wang, Zongsheng Shen, Haibin Huang, Kejie
description	Vector quantization(VQ) is a hardware-friendly DNN compression method that can reduce the storage cost and weight-loading datawidth of hardware accelerators. However, conventional VQ techniques lead to significant accuracy loss because the important weights are not well preserved. To tackle this problem, a novel approach called MVQ is proposed, which aims at better approximating important weights with a limited number of codewords. At the algorithm level, our approach removes the less important weights through N:M pruning and then minimizes the vector clustering error between the remaining weights and codewords by the masked k-means algorithm. Only distances between the unpruned weights and the codewords are computed, which are then used to update the codewords. At the architecture level, our accelerator implements vector quantization on an EWS (Enhanced weight stationary) CNN accelerator and proposes a sparse systolic array design to maximize the benefits brought by masked vector quantization.\\ Our algorithm is validated on various models for image classification, object detection, and segmentation tasks. Experimental results demonstrate that MVQ not only outperforms conventional vector quantization methods at comparable compression ratios but also reduces FLOPs. Under ASIC evaluation, our MVQ accelerator boosts energy efficiency by 2.3$\times$ and reduces the size of the systolic array by 55\% when compared with the base EWS accelerator. Compared to the previous sparse accelerators, MVQ achieves 1.73$\times$ higher energy efficiency.
doi_str_mv	10.48550/arxiv.2412.10261
format	Article
fullrecord	<record><control><sourceid>proquest_arxiv</sourceid><recordid>TN_cdi_arxiv_primary_2412_10261</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3145910637</sourcerecordid><originalsourceid>FETCH-LOGICAL-a527-76aee6d70bb6feda9550fb1a58fe8b5518c74e77bf2a1e1aa486d32404099a383</originalsourceid><addsrcrecordid>eNotj01PwkAYhDcmJhLkB3hyE8_F_exuvRHEjwQwJIRr87Z9Ny5Ci7tF1F9vAU-TzEwm8xByw9lQWa3ZPYRv_zUUioshZyLlF6QnpOSJVUJckUGMa8Y63witZY-sZqvFw7I5QKginTjnS491Sx_nczputruAMfqmplBXdFSWuMEA7dE4-PadziB-YEVXWLZNoIs91K3_PeXX5NLBJuLgX_tk-TRZjl-S6dvz63g0TUALk5gUENPKsKJIHVaQdf9dwUFbh7bQmtvSKDSmcAI4cgBl00oKxRTLMpBW9sntefYEne-C30L4yY_w-Qm-a9ydG7vQfO4xtvm62Ye6-5RLrnTGWSqN_AO3RVzE</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3145910637</pqid></control><display><type>article</type><title>MVQ:Towards Efficient DNN Compression and Acceleration with Masked Vector Quantization</title><source>arXiv.org</source><source>Open Access: Freely Accessible Journals by multiple vendors</source><creator>Li, Shuaiting ; Wang, Chengxuan ; Deng, Juncan ; Wang, Zeyu ; Ye, Zewen ; Wang, Zongsheng ; Shen, Haibin ; Huang, Kejie</creator><creatorcontrib>Li, Shuaiting ; Wang, Chengxuan ; Deng, Juncan ; Wang, Zeyu ; Ye, Zewen ; Wang, Zongsheng ; Shen, Haibin ; Huang, Kejie</creatorcontrib><description>Vector quantization(VQ) is a hardware-friendly DNN compression method that can reduce the storage cost and weight-loading datawidth of hardware accelerators. However, conventional VQ techniques lead to significant accuracy loss because the important weights are not well preserved. To tackle this problem, a novel approach called MVQ is proposed, which aims at better approximating important weights with a limited number of codewords. At the algorithm level, our approach removes the less important weights through N:M pruning and then minimizes the vector clustering error between the remaining weights and codewords by the masked k-means algorithm. Only distances between the unpruned weights and the codewords are computed, which are then used to update the codewords. At the architecture level, our accelerator implements vector quantization on an EWS (Enhanced weight stationary) CNN accelerator and proposes a sparse systolic array design to maximize the benefits brought by masked vector quantization.\\ Our algorithm is validated on various models for image classification, object detection, and segmentation tasks. Experimental results demonstrate that MVQ not only outperforms conventional vector quantization methods at comparable compression ratios but also reduces FLOPs. Under ASIC evaluation, our MVQ accelerator boosts energy efficiency by 2.3$\times$ and reduces the size of the systolic array by 55\% when compared with the base EWS accelerator. Compared to the previous sparse accelerators, MVQ achieves 1.73$\times$ higher energy efficiency.</description><identifier>EISSN: 2331-8422</identifier><identifier>DOI: 10.48550/arxiv.2412.10261</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Accelerators ; Algorithms ; Arrays ; Clustering ; Codes ; Compression ratio ; Computer Science - Computer Vision and Pattern Recognition ; Computer Science - Hardware Architecture ; Energy efficiency ; Hardware ; Image classification ; Image compression ; Image segmentation ; Object recognition</subject><ispartof>arXiv.org, 2024-12</ispartof><rights>2024. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,780,881,27902</link.rule.ids><backlink>$$Uhttps://doi.org/10.48550/arXiv.2412.10261$$DView paper in arXiv$$Hfree_for_read</backlink><backlink>$$Uhttps://doi.org/10.1145/3669940.3707268$$DView published paper (Access to full text may be restricted)$$Hfree_for_read</backlink></links><search><creatorcontrib>Li, Shuaiting</creatorcontrib><creatorcontrib>Wang, Chengxuan</creatorcontrib><creatorcontrib>Deng, Juncan</creatorcontrib><creatorcontrib>Wang, Zeyu</creatorcontrib><creatorcontrib>Ye, Zewen</creatorcontrib><creatorcontrib>Wang, Zongsheng</creatorcontrib><creatorcontrib>Shen, Haibin</creatorcontrib><creatorcontrib>Huang, Kejie</creatorcontrib><title>MVQ:Towards Efficient DNN Compression and Acceleration with Masked Vector Quantization</title><title>arXiv.org</title><description>Vector quantization(VQ) is a hardware-friendly DNN compression method that can reduce the storage cost and weight-loading datawidth of hardware accelerators. However, conventional VQ techniques lead to significant accuracy loss because the important weights are not well preserved. To tackle this problem, a novel approach called MVQ is proposed, which aims at better approximating important weights with a limited number of codewords. At the algorithm level, our approach removes the less important weights through N:M pruning and then minimizes the vector clustering error between the remaining weights and codewords by the masked k-means algorithm. Only distances between the unpruned weights and the codewords are computed, which are then used to update the codewords. At the architecture level, our accelerator implements vector quantization on an EWS (Enhanced weight stationary) CNN accelerator and proposes a sparse systolic array design to maximize the benefits brought by masked vector quantization.\\ Our algorithm is validated on various models for image classification, object detection, and segmentation tasks. Experimental results demonstrate that MVQ not only outperforms conventional vector quantization methods at comparable compression ratios but also reduces FLOPs. Under ASIC evaluation, our MVQ accelerator boosts energy efficiency by 2.3$\times$ and reduces the size of the systolic array by 55\% when compared with the base EWS accelerator. Compared to the previous sparse accelerators, MVQ achieves 1.73$\times$ higher energy efficiency.</description><subject>Accelerators</subject><subject>Algorithms</subject><subject>Arrays</subject><subject>Clustering</subject><subject>Codes</subject><subject>Compression ratio</subject><subject>Computer Science - Computer Vision and Pattern Recognition</subject><subject>Computer Science - Hardware Architecture</subject><subject>Energy efficiency</subject><subject>Hardware</subject><subject>Image classification</subject><subject>Image compression</subject><subject>Image segmentation</subject><subject>Object recognition</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><sourceid>GOX</sourceid><recordid>eNotj01PwkAYhDcmJhLkB3hyE8_F_exuvRHEjwQwJIRr87Z9Ny5Ci7tF1F9vAU-TzEwm8xByw9lQWa3ZPYRv_zUUioshZyLlF6QnpOSJVUJckUGMa8Y63witZY-sZqvFw7I5QKginTjnS491Sx_nczputruAMfqmplBXdFSWuMEA7dE4-PadziB-YEVXWLZNoIs91K3_PeXX5NLBJuLgX_tk-TRZjl-S6dvz63g0TUALk5gUENPKsKJIHVaQdf9dwUFbh7bQmtvSKDSmcAI4cgBl00oKxRTLMpBW9sntefYEne-C30L4yY_w-Qm-a9ydG7vQfO4xtvm62Ye6-5RLrnTGWSqN_AO3RVzE</recordid><startdate>20241216</startdate><enddate>20241216</enddate><creator>Li, Shuaiting</creator><creator>Wang, Chengxuan</creator><creator>Deng, Juncan</creator><creator>Wang, Zeyu</creator><creator>Ye, Zewen</creator><creator>Wang, Zongsheng</creator><creator>Shen, Haibin</creator><creator>Huang, Kejie</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20241216</creationdate><title>MVQ:Towards Efficient DNN Compression and Acceleration with Masked Vector Quantization</title><author>Li, Shuaiting ; Wang, Chengxuan ; Deng, Juncan ; Wang, Zeyu ; Ye, Zewen ; Wang, Zongsheng ; Shen, Haibin ; Huang, Kejie</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a527-76aee6d70bb6feda9550fb1a58fe8b5518c74e77bf2a1e1aa486d32404099a383</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Accelerators</topic><topic>Algorithms</topic><topic>Arrays</topic><topic>Clustering</topic><topic>Codes</topic><topic>Compression ratio</topic><topic>Computer Science - Computer Vision and Pattern Recognition</topic><topic>Computer Science - Hardware Architecture</topic><topic>Energy efficiency</topic><topic>Hardware</topic><topic>Image classification</topic><topic>Image compression</topic><topic>Image segmentation</topic><topic>Object recognition</topic><toplevel>online_resources</toplevel><creatorcontrib>Li, Shuaiting</creatorcontrib><creatorcontrib>Wang, Chengxuan</creatorcontrib><creatorcontrib>Deng, Juncan</creatorcontrib><creatorcontrib>Wang, Zeyu</creatorcontrib><creatorcontrib>Ye, Zewen</creatorcontrib><creatorcontrib>Wang, Zongsheng</creatorcontrib><creatorcontrib>Shen, Haibin</creatorcontrib><creatorcontrib>Huang, Kejie</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>SciTech Premium Collection (Proquest) (PQ_SDU_P3)</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content (ProQuest)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering collection</collection><collection>arXiv Computer Science</collection><collection>arXiv.org</collection><jtitle>arXiv.org</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Li, Shuaiting</au><au>Wang, Chengxuan</au><au>Deng, Juncan</au><au>Wang, Zeyu</au><au>Ye, Zewen</au><au>Wang, Zongsheng</au><au>Shen, Haibin</au><au>Huang, Kejie</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>MVQ:Towards Efficient DNN Compression and Acceleration with Masked Vector Quantization</atitle><jtitle>arXiv.org</jtitle><date>2024-12-16</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>Vector quantization(VQ) is a hardware-friendly DNN compression method that can reduce the storage cost and weight-loading datawidth of hardware accelerators. However, conventional VQ techniques lead to significant accuracy loss because the important weights are not well preserved. To tackle this problem, a novel approach called MVQ is proposed, which aims at better approximating important weights with a limited number of codewords. At the algorithm level, our approach removes the less important weights through N:M pruning and then minimizes the vector clustering error between the remaining weights and codewords by the masked k-means algorithm. Only distances between the unpruned weights and the codewords are computed, which are then used to update the codewords. At the architecture level, our accelerator implements vector quantization on an EWS (Enhanced weight stationary) CNN accelerator and proposes a sparse systolic array design to maximize the benefits brought by masked vector quantization.\\ Our algorithm is validated on various models for image classification, object detection, and segmentation tasks. Experimental results demonstrate that MVQ not only outperforms conventional vector quantization methods at comparable compression ratios but also reduces FLOPs. Under ASIC evaluation, our MVQ accelerator boosts energy efficiency by 2.3$\times$ and reduces the size of the systolic array by 55\% when compared with the base EWS accelerator. Compared to the previous sparse accelerators, MVQ achieves 1.73$\times$ higher energy efficiency.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><doi>10.48550/arxiv.2412.10261</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2024-12
issn	2331-8422
language	eng
recordid	cdi_arxiv_primary_2412_10261
source	arXiv.org; Open Access: Freely Accessible Journals by multiple vendors
subjects	Accelerators Algorithms Arrays Clustering Codes Compression ratio Computer Science - Computer Vision and Pattern Recognition Computer Science - Hardware Architecture Energy efficiency Hardware Image classification Image compression Image segmentation Object recognition
title	MVQ:Towards Efficient DNN Compression and Acceleration with Masked Vector Quantization
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-11T17%3A53%3A02IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_arxiv&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=MVQ:Towards%20Efficient%20DNN%20Compression%20and%20Acceleration%20with%20Masked%20Vector%20Quantization&rft.jtitle=arXiv.org&rft.au=Li,%20Shuaiting&rft.date=2024-12-16&rft.eissn=2331-8422&rft_id=info:doi/10.48550/arxiv.2412.10261&rft_dat=%3Cproquest_arxiv%3E3145910637%3C/proquest_arxiv%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3145910637&rft_id=info:pmid/&rfr_iscdi=true