Sideways: Depth-Parallel Training of Video Models
We propose Sideways, an approximate backpropagation scheme for training video models. In standard backpropagation, the gradients and activations at every computation step through the model are temporally synchronized. The forward activations need to be stored until the backward pass is executed, pre...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We propose Sideways, an approximate backpropagation scheme for training video
models. In standard backpropagation, the gradients and activations at every
computation step through the model are temporally synchronized. The forward
activations need to be stored until the backward pass is executed, preventing
inter-layer (depth) parallelization. However, can we leverage smooth, redundant
input streams such as videos to develop a more efficient training scheme? Here,
we explore an alternative to backpropagation; we overwrite network activations
whenever new ones, i.e., from new frames, become available. Such a more gradual
accumulation of information from both passes breaks the precise correspondence
between gradients and activations, leading to theoretically more noisy weight
updates. Counter-intuitively, we show that Sideways training of deep
convolutional video networks not only still converges, but can also potentially
exhibit better generalization compared to standard synchronized
backpropagation. |
---|---|
DOI: | 10.48550/arxiv.2001.06232 |