Accelerating AI Applications with Sparse Matrix Compression in Halide

Machine learning profoundly impacts every aspect of our lives. As machine learning evolves, many techniques, such as deep learning, are improving its accuracy and performance. Nonetheless, large data computations with large memory footprints will always be a bottleneck for deep learning applications...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Journal of signal processing systems 2023-05, Vol.95 (5), p.609-622
Hauptverfasser:	Lee, Chao-Lin, Chao, Chen-Ting, Chu, Wei-Hsu, Hung, Ming-Yu, Lee, Jenq-Kuen
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Building codes Circuits and Systems Computer architecture Computer Imaging Convolution Deep learning Electrical Engineering Engineering Image processing Image Processing and Computer Vision Machine learning Mathematical analysis Matrix algebra Natural language processing Pattern Recognition Pattern Recognition and Graphics Performance enhancement Programming languages Schedules Scheduling Signal,Image and Speech Processing Sparse matrices Sparsity Vision Words (language)
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Machine learning profoundly impacts every aspect of our lives. As machine learning evolves, many techniques, such as deep learning, are improving its accuracy and performance. Nonetheless, large data computations with large memory footprints will always be a bottleneck for deep learning applications. One of the most computationally demanding DNN operations is matrix multiplication, such as the convolution layer and fully connected layer, which preserve the image arrangement and obtain a partial image as an input feature. Our goal is to find an effective method for programmers to improve the performance of such matrix multiplication layers. Halide is an image processing programming language that separates the algorithm from its schedule. With the use of Halide, one can easily enhance the performance of their code with built-in scheduling primitives. In this paper, we propose sparse matrix compression schedule primitives with different compression schemes in Halide and find a method to improve convolution with the im2col method. With this design, we can compress the matrix to enhance the performance of convolution. We can also optimize natural language processing (NLP) with proposed compression scheduling. The word embedding training model can convert words into multidimensional vectors and transform words that do not have meaning into vectors with meaning. We focus on the word representation application in FastText, in which general matrix-vector multiplication (GEMV) is one of the most computationally intensive operations. We refine the software architecture of FastText and preprocess the pretrained model ahead of time. Our experiments show that the convolution and GEMV performance can be enhanced by the proposed design.
ISSN:	1939-8018 1939-8115
DOI:	10.1007/s11265-022-01821-z