SYSTEMS AND METHODS FOR PERFORMING INSTRUCTIONS TO TRANSFORM MATRICES INTO ROW-INTERLEAVED FORMAT

Disclosed embodiments relate to systems and methods for performing instructions to transform matrices into a row-interleaved format. In one example, an apparatus comprises: a plurality of registers, each register of the plurality of registers to store a plurality of matrix data elements and matrix p...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Toll, Bret, Heinecke, Alexander F, Hughes, Christopher J, Sade, Raanan, Charney, Mark J, Valentine, Robert, Ould-Ahmed-Vall, ElMoustapha
Format: Patent
Sprache:eng ; fre ; ger
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Disclosed embodiments relate to systems and methods for performing instructions to transform matrices into a row-interleaved format. In one example, an apparatus comprises: a plurality of registers, each register of the plurality of registers to store a plurality of matrix data elements and matrix processing circuitry to execute a matrix processing instruction to multiply a first source tile of a first matrix and a second source tile of a second matrix, the first source tile comprising rows and columns of a first subset of data elements of the first source matrix and the second source tile comprising rows and columns of a second subset of data elements of the second source matrix. The matrix processing circuitry comprises: circuitry to transform the first source tile by merging adjacent pairs of rows of the first source tile to generate corresponding row-interleaved data element sequences, each row-interleaved data element sequence to be loaded in a corresponding register of the plurality of registers; a set of multipliers to perform a parallel multiplication of each data element of the first subset of data elements stored in the corresponding registers of the plurality of registers with a corresponding data element of the second subset of data elements to generate a corresponding plurality of products; and accumulator circuitry to add the plurality of products to corresponding accumulated data elements of an accumulation matrix to generate corresponding result data elements of a result matrix.