Accelerating AI Applications with Sparse Matrix Compression in Halide
Machine learning profoundly impacts every aspect of our lives. As machine learning evolves, many techniques, such as deep learning, are improving its accuracy and performance. Nonetheless, large data computations with large memory footprints will always be a bottleneck for deep learning applications...
Gespeichert in:
Veröffentlicht in: | Journal of signal processing systems 2023-05, Vol.95 (5), p.609-622 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Machine learning profoundly impacts every aspect of our lives. As machine learning evolves, many techniques, such as deep learning, are improving its accuracy and performance. Nonetheless, large data computations with large memory footprints will always be a bottleneck for deep learning applications. One of the most computationally demanding DNN operations is matrix multiplication, such as the convolution layer and fully connected layer, which preserve the image arrangement and obtain a partial image as an input feature. Our goal is to find an effective method for programmers to improve the performance of such matrix multiplication layers. Halide is an image processing programming language that separates the algorithm from its schedule. With the use of Halide, one can easily enhance the performance of their code with built-in scheduling primitives. In this paper, we propose sparse matrix compression schedule primitives with different compression schemes in Halide and find a method to improve convolution with the im2col method. With this design, we can compress the matrix to enhance the performance of convolution. We can also optimize natural language processing (NLP) with proposed compression scheduling. The word embedding training model can convert words into multidimensional vectors and transform words that do not have meaning into vectors with meaning. We focus on the word representation application in FastText, in which general matrix-vector multiplication (GEMV) is one of the most computationally intensive operations. We refine the software architecture of FastText and preprocess the pretrained model ahead of time. Our experiments show that the convolution and GEMV performance can be enhanced by the proposed design. |
---|---|
ISSN: | 1939-8018 1939-8115 |
DOI: | 10.1007/s11265-022-01821-z |