The Impact of 8- and 4-Bit Quantization on the Accuracy and Silicon Area Footprint of Tiny Neural Networks

In the field of embedded and edge devices, efforts have been made to make deep neural network models smaller due to the limited size of the available memory and the low computational efficiency. Typical model footprints are under 100 KB. However, for some applications, models of this size are too la...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Electronics (Basel) 2025-01, Vol.14 (1), p.14
Hauptverfasser:	Tumialis, Paweł, Skierkowski, Marcel, Przychodny, Jakub, Obszarski, Paweł
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Application specific integrated circuits Artificial intelligence Artificial neural networks Classification Data processing Datasets Downsizing Embedded systems Energy consumption Memory devices Neural networks Parameters Sensors Silicon
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue	1
container_start_page	14
container_title	Electronics (Basel)
container_volume	14
creator	Tumialis, Paweł Skierkowski, Marcel Przychodny, Jakub Obszarski, Paweł
description	In the field of embedded and edge devices, efforts have been made to make deep neural network models smaller due to the limited size of the available memory and the low computational efficiency. Typical model footprints are under 100 KB. However, for some applications, models of this size are too large. In low-voltage sensors, signals must be processed, classified or predicted with an order of magnitude smaller memory. Model downsizing can be performed by limiting the number of model parameters or quantizing their weights. These types of operations have a negative impact on the accuracy of the deep network. This study tested the effect of model downscaling techniques on accuracy. The main idea was to reduce neural network models to 3 k parameters or less. Tests were conducted on three different neural network architectures in the context of three separate research problems, modeling real tasks for small networks. The impact of the reduction in the accuracy of the network depends mainly on its initial size. For a network reduced from 40 k parameters, a decrease in accuracy of 16 percentage points was achieved, and for a network with 20 k parameters, a decrease of 8 points was achieved. To obtain the best results, knowledge distillation and quantization-aware training methods were used for training. Thanks to this, the accuracy of the 4-bit networks did not differ significantly from the 8-bit ones and their results were approximately four percentage points worse than those of the full precision networks. For the fully connected network, synthesis to ASIC (application-specific integrated circuit) was also performed to demonstrate the reduction in the silicon area occupied by the model. The 4-bit quantization limits the silicon area footprint by 90%.
doi_str_mv	10.3390/electronics14010014
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_crossref_primary_10_3390_electronics14010014</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3153797394</sourcerecordid><originalsourceid>FETCH-LOGICAL-c1081-5f859986a9cabbaa37febcf04ad0843fe739d7e4f3d03b7c6775af68e735d4d43</originalsourceid><addsrcrecordid>eNptkMFKAzEQhoMoWGqfwEvA82rSZJvNsRarhaKI9bzMZhNM3SY1ySL16Zu2Hjw4DPzDzMc_wyB0TcktY5Lc6U6rFLyzKlJOKCGUn6HBmAhZyLEcn_-pL9EoxjXJISmrGBmg9epD48VmCyphb3BVYHAt5sW9Tfi1B5fsDyTrHc6ZMjpVqg-gdkfszXZW5cE0aMBz79M2WHf0WVm3w886o12W9O3DZ7xCFwa6qEe_OkTv84fV7KlYvjwuZtNloSipaFGaqpSymoBU0DQATBjdKEM4tKTizGjBZCs0N6wlrBFqIkQJZlLlftnylrMhujn5boP_6nVM9dr3weWVNaMlEzIbHCh2olTwMQZt6nz8BsKupqQ-_LX-569sD7e3bgQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3153797394</pqid></control><display><type>article</type><title>The Impact of 8- and 4-Bit Quantization on the Accuracy and Silicon Area Footprint of Tiny Neural Networks</title><source>MDPI - Multidisciplinary Digital Publishing Institute</source><source>EZB-FREE-00999 freely available EZB journals</source><creator>Tumialis, Paweł ; Skierkowski, Marcel ; Przychodny, Jakub ; Obszarski, Paweł</creator><creatorcontrib>Tumialis, Paweł ; Skierkowski, Marcel ; Przychodny, Jakub ; Obszarski, Paweł</creatorcontrib><description>In the field of embedded and edge devices, efforts have been made to make deep neural network models smaller due to the limited size of the available memory and the low computational efficiency. Typical model footprints are under 100 KB. However, for some applications, models of this size are too large. In low-voltage sensors, signals must be processed, classified or predicted with an order of magnitude smaller memory. Model downsizing can be performed by limiting the number of model parameters or quantizing their weights. These types of operations have a negative impact on the accuracy of the deep network. This study tested the effect of model downscaling techniques on accuracy. The main idea was to reduce neural network models to 3 k parameters or less. Tests were conducted on three different neural network architectures in the context of three separate research problems, modeling real tasks for small networks. The impact of the reduction in the accuracy of the network depends mainly on its initial size. For a network reduced from 40 k parameters, a decrease in accuracy of 16 percentage points was achieved, and for a network with 20 k parameters, a decrease of 8 points was achieved. To obtain the best results, knowledge distillation and quantization-aware training methods were used for training. Thanks to this, the accuracy of the 4-bit networks did not differ significantly from the 8-bit ones and their results were approximately four percentage points worse than those of the full precision networks. For the fully connected network, synthesis to ASIC (application-specific integrated circuit) was also performed to demonstrate the reduction in the silicon area occupied by the model. The 4-bit quantization limits the silicon area footprint by 90%.</description><identifier>ISSN: 2079-9292</identifier><identifier>EISSN: 2079-9292</identifier><identifier>DOI: 10.3390/electronics14010014</identifier><language>eng</language><publisher>Basel: MDPI AG</publisher><subject>Accuracy ; Application specific integrated circuits ; Artificial intelligence ; Artificial neural networks ; Classification ; Data processing ; Datasets ; Downsizing ; Embedded systems ; Energy consumption ; Memory devices ; Neural networks ; Parameters ; Sensors ; Silicon</subject><ispartof>Electronics (Basel), 2025-01, Vol.14 (1), p.14</ispartof><rights>2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c1081-5f859986a9cabbaa37febcf04ad0843fe739d7e4f3d03b7c6775af68e735d4d43</cites><orcidid>0000-0002-3277-1104 ; 0009-0009-0740-4546</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27903,27904</link.rule.ids></links><search><creatorcontrib>Tumialis, Paweł</creatorcontrib><creatorcontrib>Skierkowski, Marcel</creatorcontrib><creatorcontrib>Przychodny, Jakub</creatorcontrib><creatorcontrib>Obszarski, Paweł</creatorcontrib><title>The Impact of 8- and 4-Bit Quantization on the Accuracy and Silicon Area Footprint of Tiny Neural Networks</title><title>Electronics (Basel)</title><description>In the field of embedded and edge devices, efforts have been made to make deep neural network models smaller due to the limited size of the available memory and the low computational efficiency. Typical model footprints are under 100 KB. However, for some applications, models of this size are too large. In low-voltage sensors, signals must be processed, classified or predicted with an order of magnitude smaller memory. Model downsizing can be performed by limiting the number of model parameters or quantizing their weights. These types of operations have a negative impact on the accuracy of the deep network. This study tested the effect of model downscaling techniques on accuracy. The main idea was to reduce neural network models to 3 k parameters or less. Tests were conducted on three different neural network architectures in the context of three separate research problems, modeling real tasks for small networks. The impact of the reduction in the accuracy of the network depends mainly on its initial size. For a network reduced from 40 k parameters, a decrease in accuracy of 16 percentage points was achieved, and for a network with 20 k parameters, a decrease of 8 points was achieved. To obtain the best results, knowledge distillation and quantization-aware training methods were used for training. Thanks to this, the accuracy of the 4-bit networks did not differ significantly from the 8-bit ones and their results were approximately four percentage points worse than those of the full precision networks. For the fully connected network, synthesis to ASIC (application-specific integrated circuit) was also performed to demonstrate the reduction in the silicon area occupied by the model. The 4-bit quantization limits the silicon area footprint by 90%.</description><subject>Accuracy</subject><subject>Application specific integrated circuits</subject><subject>Artificial intelligence</subject><subject>Artificial neural networks</subject><subject>Classification</subject><subject>Data processing</subject><subject>Datasets</subject><subject>Downsizing</subject><subject>Embedded systems</subject><subject>Energy consumption</subject><subject>Memory devices</subject><subject>Neural networks</subject><subject>Parameters</subject><subject>Sensors</subject><subject>Silicon</subject><issn>2079-9292</issn><issn>2079-9292</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2025</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNptkMFKAzEQhoMoWGqfwEvA82rSZJvNsRarhaKI9bzMZhNM3SY1ySL16Zu2Hjw4DPzDzMc_wyB0TcktY5Lc6U6rFLyzKlJOKCGUn6HBmAhZyLEcn_-pL9EoxjXJISmrGBmg9epD48VmCyphb3BVYHAt5sW9Tfi1B5fsDyTrHc6ZMjpVqg-gdkfszXZW5cE0aMBz79M2WHf0WVm3w886o12W9O3DZ7xCFwa6qEe_OkTv84fV7KlYvjwuZtNloSipaFGaqpSymoBU0DQATBjdKEM4tKTizGjBZCs0N6wlrBFqIkQJZlLlftnylrMhujn5boP_6nVM9dr3weWVNaMlEzIbHCh2olTwMQZt6nz8BsKupqQ-_LX-569sD7e3bgQ</recordid><startdate>20250101</startdate><enddate>20250101</enddate><creator>Tumialis, Paweł</creator><creator>Skierkowski, Marcel</creator><creator>Przychodny, Jakub</creator><creator>Obszarski, Paweł</creator><general>MDPI AG</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SP</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L7M</scope><scope>P5Z</scope><scope>P62</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><orcidid>https://orcid.org/0000-0002-3277-1104</orcidid><orcidid>https://orcid.org/0009-0009-0740-4546</orcidid></search><sort><creationdate>20250101</creationdate><title>The Impact of 8- and 4-Bit Quantization on the Accuracy and Silicon Area Footprint of Tiny Neural Networks</title><author>Tumialis, Paweł ; Skierkowski, Marcel ; Przychodny, Jakub ; Obszarski, Paweł</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c1081-5f859986a9cabbaa37febcf04ad0843fe739d7e4f3d03b7c6775af68e735d4d43</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2025</creationdate><topic>Accuracy</topic><topic>Application specific integrated circuits</topic><topic>Artificial intelligence</topic><topic>Artificial neural networks</topic><topic>Classification</topic><topic>Data processing</topic><topic>Datasets</topic><topic>Downsizing</topic><topic>Embedded systems</topic><topic>Energy consumption</topic><topic>Memory devices</topic><topic>Neural networks</topic><topic>Parameters</topic><topic>Sensors</topic><topic>Silicon</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Tumialis, Paweł</creatorcontrib><creatorcontrib>Skierkowski, Marcel</creatorcontrib><creatorcontrib>Przychodny, Jakub</creatorcontrib><creatorcontrib>Obszarski, Paweł</creatorcontrib><collection>CrossRef</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><jtitle>Electronics (Basel)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Tumialis, Paweł</au><au>Skierkowski, Marcel</au><au>Przychodny, Jakub</au><au>Obszarski, Paweł</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>The Impact of 8- and 4-Bit Quantization on the Accuracy and Silicon Area Footprint of Tiny Neural Networks</atitle><jtitle>Electronics (Basel)</jtitle><date>2025-01-01</date><risdate>2025</risdate><volume>14</volume><issue>1</issue><spage>14</spage><pages>14-</pages><issn>2079-9292</issn><eissn>2079-9292</eissn><abstract>In the field of embedded and edge devices, efforts have been made to make deep neural network models smaller due to the limited size of the available memory and the low computational efficiency. Typical model footprints are under 100 KB. However, for some applications, models of this size are too large. In low-voltage sensors, signals must be processed, classified or predicted with an order of magnitude smaller memory. Model downsizing can be performed by limiting the number of model parameters or quantizing their weights. These types of operations have a negative impact on the accuracy of the deep network. This study tested the effect of model downscaling techniques on accuracy. The main idea was to reduce neural network models to 3 k parameters or less. Tests were conducted on three different neural network architectures in the context of three separate research problems, modeling real tasks for small networks. The impact of the reduction in the accuracy of the network depends mainly on its initial size. For a network reduced from 40 k parameters, a decrease in accuracy of 16 percentage points was achieved, and for a network with 20 k parameters, a decrease of 8 points was achieved. To obtain the best results, knowledge distillation and quantization-aware training methods were used for training. Thanks to this, the accuracy of the 4-bit networks did not differ significantly from the 8-bit ones and their results were approximately four percentage points worse than those of the full precision networks. For the fully connected network, synthesis to ASIC (application-specific integrated circuit) was also performed to demonstrate the reduction in the silicon area occupied by the model. The 4-bit quantization limits the silicon area footprint by 90%.</abstract><cop>Basel</cop><pub>MDPI AG</pub><doi>10.3390/electronics14010014</doi><orcidid>https://orcid.org/0000-0002-3277-1104</orcidid><orcidid>https://orcid.org/0009-0009-0740-4546</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 2079-9292
ispartof	Electronics (Basel), 2025-01, Vol.14 (1), p.14
issn	2079-9292 2079-9292
language	eng
recordid	cdi_crossref_primary_10_3390_electronics14010014
source	MDPI - Multidisciplinary Digital Publishing Institute; EZB-FREE-00999 freely available EZB journals
subjects	Accuracy Application specific integrated circuits Artificial intelligence Artificial neural networks Classification Data processing Datasets Downsizing Embedded systems Energy consumption Memory devices Neural networks Parameters Sensors Silicon
title	The Impact of 8- and 4-Bit Quantization on the Accuracy and Silicon Area Footprint of Tiny Neural Networks
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-26T13%3A15%3A16IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=The%20Impact%20of%208-%20and%204-Bit%20Quantization%20on%20the%20Accuracy%20and%20Silicon%20Area%20Footprint%20of%20Tiny%20Neural%20Networks&rft.jtitle=Electronics%20(Basel)&rft.au=Tumialis,%20Pawe%C5%82&rft.date=2025-01-01&rft.volume=14&rft.issue=1&rft.spage=14&rft.pages=14-&rft.issn=2079-9292&rft.eissn=2079-9292&rft_id=info:doi/10.3390/electronics14010014&rft_dat=%3Cproquest_cross%3E3153797394%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3153797394&rft_id=info:pmid/&rfr_iscdi=true