OLLA: Optimizing the Lifetime and Location of Arrays to Reduce the Memory Usage of Neural Networks
The size of deep neural networks has grown exponentially in recent years. Unfortunately, hardware devices have not kept pace with the rapidly increasing memory requirements. To cope with this, researchers have turned to techniques such as spilling and recomputation, which increase training time, or...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The size of deep neural networks has grown exponentially in recent years.
Unfortunately, hardware devices have not kept pace with the rapidly increasing
memory requirements. To cope with this, researchers have turned to techniques
such as spilling and recomputation, which increase training time, or reduced
precision and model pruning, which can affect model accuracy. We present OLLA,
an algorithm that optimizes the lifetime and memory location of the tensors
used to train neural networks. Our method reduces the memory usage of existing
neural networks, without needing any modification to the models or their
training procedures. We formulate the problem as a joint integer linear program
(ILP). We present several techniques to simplify the encoding of the problem,
and enable our approach to scale to the size of state-of-the-art neural
networks using an off-the-shelf ILP solver. We experimentally demonstrate that
OLLA only takes minutes if not seconds to allow the training of neural networks
using one-third less memory on average. |
---|---|
DOI: | 10.48550/arxiv.2210.12924 |