HARDWARE ACCELERATED MACHINE LEARNING

Some embodiments of the present disclose relates to an apparatus with a CPU, a bus to couple the CPU to a DRAM; and a machine-learning hardware accelerator coupled to the CPU. The machine-learning accelerator comprises, among others, a plurality of operation units to perform a plurality of parallel...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Ng, Choong, Bruestle, Jeremy
Format: Patent
Sprache:eng ; fre ; ger
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Some embodiments of the present disclose relates to an apparatus with a CPU, a bus to couple the CPU to a DRAM; and a machine-learning hardware accelerator coupled to the CPU. The machine-learning accelerator comprises, among others, a plurality of operation units to perform a plurality of parallel MAC operations in accordance with a vector MAC instruction including an operation value indicating a MAC operation, an indication of a first plurality of the real numbers of the first multidimensional array and a second plurality of the real numbers of the second multidimensional array, and permutation information; and circuitry to permute the first plurality of the real numbers of the first multidimensional array in accordance with the permutation information to generate a permuted first plurality of real numbers. Each operation unit comprises: a multiplier to multiply a first real number of the permuted first plurality of real numbers and a corresponding second real number of a second plurality of the real numbers associated with the second multidimensional array to generate a product, and an accumulator to add the product to an accumulation value to generate a result value, the first real number and the second real number each having a first bit width and the accumulation value having a second bit width at least twice the first bit width.