Dynamic Automatic Differentiation of GPU Broadcast Kernels
We show how forward-mode automatic differentiation (AD) can be employed within larger reverse-mode computations to dynamically differentiate broadcast operations in a GPU-friendly manner. Our technique fully exploits the broadcast Jacobian's inherent sparsity structure, and unlike a pure revers...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We show how forward-mode automatic differentiation (AD) can be employed
within larger reverse-mode computations to dynamically differentiate broadcast
operations in a GPU-friendly manner. Our technique fully exploits the broadcast
Jacobian's inherent sparsity structure, and unlike a pure reverse-mode
approach, this "mixed-mode" approach does not require a backwards pass over the
broadcasted operation's subgraph, obviating the need for several
reverse-mode-specific programmability restrictions on user-authored broadcast
operations. Most notably, this approach allows broadcast fusion in primal code
despite the presence of data-dependent control flow. We discuss an experiment
in which a Julia implementation of our technique outperformed pure reverse-mode
TensorFlow and Julia implementations for differentiating through broadcast
operations within an HM-LSTM cell update calculation. |
---|---|
DOI: | 10.48550/arxiv.1810.08297 |