HARDWARE ACCELERATED MACHINE LEARNING
Some embodiments of the present disclose relates to an apparatus with a CPU, a bus to couple the CPU to a DRAM; and a machine-learning hardware accelerator coupled to the CPU. The machine-learning accelerator comprises, among others, a plurality of operation units to perform a plurality of parallel...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Patent |
Sprache: | eng ; fre ; ger |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Some embodiments of the present disclose relates to an apparatus with a CPU, a bus to couple the CPU to a DRAM; and a machine-learning hardware accelerator coupled to the CPU. The machine-learning accelerator comprises, among others, a plurality of operation units to perform a plurality of parallel MAC operations in accordance with a vector MAC instruction including an operation value indicating a MAC operation, an indication of a first plurality of the real numbers of the first multidimensional array and a second plurality of the real numbers of the second multidimensional array, and permutation information; and circuitry to permute the first plurality of the real numbers of the first multidimensional array in accordance with the permutation information to generate a permuted first plurality of real numbers. Each operation unit comprises: a multiplier to multiply a first real number of the permuted first plurality of real numbers and a corresponding second real number of a second plurality of the real numbers associated with the second multidimensional array to generate a product, and an accumulator to add the product to an accumulation value to generate a result value, the first real number and the second real number each having a first bit width and the accumulation value having a second bit width at least twice the first bit width. |
---|