DeepliteRT: Computer Vision at the Edge

The proliferation of edge devices has unlocked unprecedented opportunities for deep learning model deployment in computer vision applications. However, these complex models require considerable power, memory and compute resources that are typically not available on edge platforms. Ultra low-bit quan...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Ashfaq, Saad, Hoffman, Alexander, Mitra, Saptarshi, Sah, Sudhakar, AskariHemmat, MohammadHossein, Saboori, Ehsan
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Computer Vision and Pattern Recognition Computer Science - Learning
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Ashfaq, Saad Hoffman, Alexander Mitra, Saptarshi Sah, Sudhakar AskariHemmat, MohammadHossein Saboori, Ehsan
description	The proliferation of edge devices has unlocked unprecedented opportunities for deep learning model deployment in computer vision applications. However, these complex models require considerable power, memory and compute resources that are typically not available on edge platforms. Ultra low-bit quantization presents an attractive solution to this problem by scaling down the model weights and activations from 32-bit to less than 8-bit. We implement highly optimized ultra low-bit convolution operators for ARM-based targets that outperform existing methods by up to 4.34x. Our operator is implemented within Deeplite Runtime (DeepliteRT), an end-to-end solution for the compilation, tuning, and inference of ultra low-bit models on ARM devices. Compiler passes in DeepliteRT automatically convert a fake-quantized model in full precision to a compact ultra low-bit representation, easing the process of quantized model deployment on commodity hardware. We analyze the performance of DeepliteRT on classification and detection models against optimized 32-bit floating-point, 8-bit integer, and 2-bit baselines, achieving significant speedups of up to 2.20x, 2.33x and 2.17x, respectively.
doi_str_mv	10.48550/arxiv.2309.10878
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2309_10878</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2309_10878</sourcerecordid><originalsourceid>FETCH-LOGICAL-a678-cb670473a802adc1759753e09bfec57917f7292ea9e166fd2b7eb43f59ae2c143</originalsourceid><addsrcrecordid>eNotzruKwkAUgOFpLBb1AbZyOqvEuWRyZuwkXlYQBAm24SQ5owNeQoyy-_aia_V3Px9j31LEiTVGTLD9DY9YaeFiKSzYLzaeEzWn0NEun_Lsem7uHbV8H27heuHY8e5IfFEfaMB6Hk83Gn7aZ_lykWc_0Wa7WmezTYQp2KgqUxAJaLRCYV1JMA6MJuFKT5UBJ8GDcorQkUxTX6sSqEy0Nw5JVTLRfTb6376lRdOGM7Z_xUtcvMX6CR1MOc8</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>DeepliteRT: Computer Vision at the Edge</title><source>arXiv.org</source><creator>Ashfaq, Saad ; Hoffman, Alexander ; Mitra, Saptarshi ; Sah, Sudhakar ; AskariHemmat, MohammadHossein ; Saboori, Ehsan</creator><creatorcontrib>Ashfaq, Saad ; Hoffman, Alexander ; Mitra, Saptarshi ; Sah, Sudhakar ; AskariHemmat, MohammadHossein ; Saboori, Ehsan</creatorcontrib><description>The proliferation of edge devices has unlocked unprecedented opportunities for deep learning model deployment in computer vision applications. However, these complex models require considerable power, memory and compute resources that are typically not available on edge platforms. Ultra low-bit quantization presents an attractive solution to this problem by scaling down the model weights and activations from 32-bit to less than 8-bit. We implement highly optimized ultra low-bit convolution operators for ARM-based targets that outperform existing methods by up to 4.34x. Our operator is implemented within Deeplite Runtime (DeepliteRT), an end-to-end solution for the compilation, tuning, and inference of ultra low-bit models on ARM devices. Compiler passes in DeepliteRT automatically convert a fake-quantized model in full precision to a compact ultra low-bit representation, easing the process of quantized model deployment on commodity hardware. We analyze the performance of DeepliteRT on classification and detection models against optimized 32-bit floating-point, 8-bit integer, and 2-bit baselines, achieving significant speedups of up to 2.20x, 2.33x and 2.17x, respectively.</description><identifier>DOI: 10.48550/arxiv.2309.10878</identifier><language>eng</language><subject>Computer Science - Computer Vision and Pattern Recognition ; Computer Science - Learning</subject><creationdate>2023-09</creationdate><rights>http://creativecommons.org/licenses/by-nc-sa/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2309.10878$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2309.10878$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Ashfaq, Saad</creatorcontrib><creatorcontrib>Hoffman, Alexander</creatorcontrib><creatorcontrib>Mitra, Saptarshi</creatorcontrib><creatorcontrib>Sah, Sudhakar</creatorcontrib><creatorcontrib>AskariHemmat, MohammadHossein</creatorcontrib><creatorcontrib>Saboori, Ehsan</creatorcontrib><title>DeepliteRT: Computer Vision at the Edge</title><description>The proliferation of edge devices has unlocked unprecedented opportunities for deep learning model deployment in computer vision applications. However, these complex models require considerable power, memory and compute resources that are typically not available on edge platforms. Ultra low-bit quantization presents an attractive solution to this problem by scaling down the model weights and activations from 32-bit to less than 8-bit. We implement highly optimized ultra low-bit convolution operators for ARM-based targets that outperform existing methods by up to 4.34x. Our operator is implemented within Deeplite Runtime (DeepliteRT), an end-to-end solution for the compilation, tuning, and inference of ultra low-bit models on ARM devices. Compiler passes in DeepliteRT automatically convert a fake-quantized model in full precision to a compact ultra low-bit representation, easing the process of quantized model deployment on commodity hardware. We analyze the performance of DeepliteRT on classification and detection models against optimized 32-bit floating-point, 8-bit integer, and 2-bit baselines, achieving significant speedups of up to 2.20x, 2.33x and 2.17x, respectively.</description><subject>Computer Science - Computer Vision and Pattern Recognition</subject><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotzruKwkAUgOFpLBb1AbZyOqvEuWRyZuwkXlYQBAm24SQ5owNeQoyy-_aia_V3Px9j31LEiTVGTLD9DY9YaeFiKSzYLzaeEzWn0NEun_Lsem7uHbV8H27heuHY8e5IfFEfaMB6Hk83Gn7aZ_lykWc_0Wa7WmezTYQp2KgqUxAJaLRCYV1JMA6MJuFKT5UBJ8GDcorQkUxTX6sSqEy0Nw5JVTLRfTb6376lRdOGM7Z_xUtcvMX6CR1MOc8</recordid><startdate>20230919</startdate><enddate>20230919</enddate><creator>Ashfaq, Saad</creator><creator>Hoffman, Alexander</creator><creator>Mitra, Saptarshi</creator><creator>Sah, Sudhakar</creator><creator>AskariHemmat, MohammadHossein</creator><creator>Saboori, Ehsan</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20230919</creationdate><title>DeepliteRT: Computer Vision at the Edge</title><author>Ashfaq, Saad ; Hoffman, Alexander ; Mitra, Saptarshi ; Sah, Sudhakar ; AskariHemmat, MohammadHossein ; Saboori, Ehsan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a678-cb670473a802adc1759753e09bfec57917f7292ea9e166fd2b7eb43f59ae2c143</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Computer Vision and Pattern Recognition</topic><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Ashfaq, Saad</creatorcontrib><creatorcontrib>Hoffman, Alexander</creatorcontrib><creatorcontrib>Mitra, Saptarshi</creatorcontrib><creatorcontrib>Sah, Sudhakar</creatorcontrib><creatorcontrib>AskariHemmat, MohammadHossein</creatorcontrib><creatorcontrib>Saboori, Ehsan</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Ashfaq, Saad</au><au>Hoffman, Alexander</au><au>Mitra, Saptarshi</au><au>Sah, Sudhakar</au><au>AskariHemmat, MohammadHossein</au><au>Saboori, Ehsan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>DeepliteRT: Computer Vision at the Edge</atitle><date>2023-09-19</date><risdate>2023</risdate><abstract>The proliferation of edge devices has unlocked unprecedented opportunities for deep learning model deployment in computer vision applications. However, these complex models require considerable power, memory and compute resources that are typically not available on edge platforms. Ultra low-bit quantization presents an attractive solution to this problem by scaling down the model weights and activations from 32-bit to less than 8-bit. We implement highly optimized ultra low-bit convolution operators for ARM-based targets that outperform existing methods by up to 4.34x. Our operator is implemented within Deeplite Runtime (DeepliteRT), an end-to-end solution for the compilation, tuning, and inference of ultra low-bit models on ARM devices. Compiler passes in DeepliteRT automatically convert a fake-quantized model in full precision to a compact ultra low-bit representation, easing the process of quantized model deployment on commodity hardware. We analyze the performance of DeepliteRT on classification and detection models against optimized 32-bit floating-point, 8-bit integer, and 2-bit baselines, achieving significant speedups of up to 2.20x, 2.33x and 2.17x, respectively.</abstract><doi>10.48550/arxiv.2309.10878</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2309.10878
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2309_10878
source	arXiv.org
subjects	Computer Science - Computer Vision and Pattern Recognition Computer Science - Learning
title	DeepliteRT: Computer Vision at the Edge
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-25T21%3A16%3A16IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=DeepliteRT:%20Computer%20Vision%20at%20the%20Edge&rft.au=Ashfaq,%20Saad&rft.date=2023-09-19&rft_id=info:doi/10.48550/arxiv.2309.10878&rft_dat=%3Carxiv_GOX%3E2309_10878%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true