ACCELERATOR FOR SPARSE-DENSE MATRIX MULTIPLICATION

Disclosed embodiments relate to multiply-accumulate operations. In one example, a processor, comprises fetch circuitry, a plurality of registers, and execution circuitry. The fetch circuitry is to fetch a sparse-dense matrix multiplication (SDMM) instruction from a memory. The SDMM instruction has f...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Satish, Nadathur Rajagopalan, Narayanamoorthy, Srinivasan, Suprun, Alexey, Janik, Kenneth J
Format: Patent
Sprache:eng ; fin
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Disclosed embodiments relate to multiply-accumulate operations. In one example, a processor, comprises fetch circuitry, a plurality of registers, and execution circuitry. The fetch circuitry is to fetch a sparse-dense matrix multiplication (SDMM) instruction from a memory. The SDMM instruction has fields to specify an opcode to indicate a sparse-dense matrix multiplication operation, a result matrix having matrix data element dimensions of M x N, a first source matrix having source matrix data element dimensions of K x N, and a second source matrix representing a sparse source matrix having source matrix data element dimensions of M x K. Matrix data elements in the K dimension of the sparse source matrix comprise zero-value data elements and remaining data elements. In the K dimension, the second source matrix is to include the remaining data elements having position values associated therewith. The plurality of registers is to store a plurality of source data elements of the first source matrix, to store the remaining data elements of the second source matrix, and to store the position values associated with the remaining data elements, each position value associated with a location of a corresponding one of the remaining data elements within the sparse source matrix. The execution circuitry, responsive to the SDMM instruction, is to multiply the remaining data elements with corresponding data elements of the first source matrix in accordance with the position values to produce a plurality of products, and to accumulate subsets of the plurality of products to produce corresponding data elements of the result matrix.