Go beyond End-to-End Training: Boosting Greedy Local Learning with Context Supply
Traditional end-to-end (E2E) training of deep networks necessitates storing intermediate activations for back-propagation, resulting in a large memory footprint on GPUs and restricted model parallelization. As an alternative, greedy local learning partitions the network into gradient-isolated module...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Traditional end-to-end (E2E) training of deep networks necessitates storing
intermediate activations for back-propagation, resulting in a large memory
footprint on GPUs and restricted model parallelization. As an alternative,
greedy local learning partitions the network into gradient-isolated modules and
trains supervisely based on local preliminary losses, thereby providing
asynchronous and parallel training methods that substantially reduce memory
cost. However, empirical experiments reveal that as the number of segmentations
of the gradient-isolated module increases, the performance of the local
learning scheme degrades substantially, severely limiting its expansibility. To
avoid this issue, we theoretically analyze the greedy local learning from the
standpoint of information theory and propose a ContSup scheme, which
incorporates context supply between isolated modules to compensate for
information loss. Experiments on benchmark datasets (i.e. CIFAR, SVHN, STL-10)
achieve SOTA results and indicate that our proposed method can significantly
improve the performance of greedy local learning with minimal memory and
computational overhead, allowing for the boost of the number of isolated
modules. Our codes are available at https://github.com/Tab-ct/ContSup. |
---|---|
DOI: | 10.48550/arxiv.2312.07636 |