Hardware Acceleration of Sparse and Irregular Tensor Computations of ML Models: A Survey and Insights

Machine learning (ML) models are widely used in many important domains. For efficiently processing these computational- and memory-intensive applications, tensors of these overparameterized models are compressed by leveraging sparsity, size reduction, and quantization of tensors. Unstructured sparsi...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Proceedings of the IEEE 2021-10, Vol.109 (10), p.1706-1752
Hauptverfasser:	Dave, Shail, Baghdadi, Riyadh, Nowatzki, Tony, Avancha, Sasikanth, Shrivastava, Aviral, Li, Baoxin
Format:	Artikel
Sprache:	eng
Schlagworte:	Accelerators Analytical models Co-design Communication Compact models compiler optimizations Computational efficiency Computational modeling Computer memory Cost analysis Data models dataflow deep learning deep neural networks (DNNs) dimension reduction energy efficiency Hardware Hardware acceleration hardware/software/model codesign Machine learning machine learning (ML) Mathematical analysis pruning quantization Quantization (signal) reconfigurable computing Shape Software Sparsity spatial architecture Storage tensor decomposition Tensors VLSI
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Machine learning (ML) models are widely used in many important domains. For efficiently processing these computational- and memory-intensive applications, tensors of these overparameterized models are compressed by leveraging sparsity, size reduction, and quantization of tensors. Unstructured sparsity and tensors with varying dimensions yield irregular computation, communication, and memory access patterns; processing them on hardware accelerators in a conventional manner does not inherently leverage acceleration opportunities. This article provides a comprehensive survey on the efficient execution of sparse and irregular tensor computations of ML models on hardware accelerators. In particular, it discusses enhancement modules in the architecture design and the software support, categorizes different hardware designs and acceleration techniques, analyzes them in terms of hardware and execution costs, analyzes achievable accelerations for recent DNNs, and highlights further opportunities in terms of hardware/software/model codesign optimizations (inter/intramodule). The takeaways from this article include the following: understanding the key challenges in accelerating sparse, irregular shaped, and quantized tensors; understanding enhancements in accelerator systems for supporting their efficient computations; analyzing tradeoffs in opting for a specific design choice for encoding, storing, extracting, communicating, computing, and load-balancing the nonzeros; understanding how structured sparsity can improve storage efficiency and balance computations; understanding how to compile and map models with sparse tensors on the accelerators; and understanding recent design trends for efficient accelerations and further opportunities.
ISSN:	0018-9219 1558-2256
DOI:	10.1109/JPROC.2021.3098483