Techniques for balancing workloads when parallelizing multiply-accumulate computations

In various embodiments, a dispatch application performs multiply-accumulate ("MAC") computations across parallel processing elements. In operation, the dispatch application determines a first quantity of iterations associated with a given MAC computation. The dispatch application determine...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
1. Verfasser: Merrill, III, Duane George
Format: Patent
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In various embodiments, a dispatch application performs multiply-accumulate ("MAC") computations across parallel processing elements. In operation, the dispatch application determines a first quantity of iterations associated with a given MAC computation. The dispatch application determines a maximum number of tasks that can execute concurrently across a set of parallel processing elements. Subsequently, the dispatch application causes the maximum number of tasks to be executed concurrently across the set of parallel processing elements in order to perform the MAC computation. During execution, each task performs a substantially similar number of the first quantity of iterations. Relative to conventional tile-based approaches to performing MAC computations across parallel processing elements, the dispatch application can more evenly distribute iterations across the different parallel processing elements. Accordingly, the dispatch application can reduce the amount of parallel processing element idle time when performing MAC computations.