Towards Exact Gradient-based Training on Analog In-memory Computing
Given the high economic and environmental costs of using large vision or language models, analog in-memory accelerators present a promising solution for energy-efficient AI. While inference on analog accelerators has been studied recently, the training perspective is underexplored. Recent studies ha...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Given the high economic and environmental costs of using large vision or
language models, analog in-memory accelerators present a promising solution for
energy-efficient AI. While inference on analog accelerators has been studied
recently, the training perspective is underexplored. Recent studies have shown
that the "workhorse" of digital AI training - stochastic gradient descent (SGD)
algorithm converges inexactly when applied to model training on non-ideal
devices. This paper puts forth a theoretical foundation for gradient-based
training on analog devices. We begin by characterizing the non-convergent issue
of SGD, which is caused by the asymmetric updates on the analog devices. We
then provide a lower bound of the asymptotic error to show that there is a
fundamental performance limit of SGD-based analog training rather than an
artifact of our analysis. To address this issue, we study a heuristic analog
algorithm called Tiki-Taka that has recently exhibited superior empirical
performance compared to SGD and rigorously show its ability to exactly converge
to a critical point and hence eliminates the asymptotic error. The simulations
verify the correctness of the analyses. |
---|---|
DOI: | 10.48550/arxiv.2406.12774 |