Computation of Generalized Derivatives for Abs-Smooth Functions by Backward Mode Algorithmic Differentiation and Implications to Deep Learning
Algorithmic differentiation (AD) tools allow to obtain gradient information of a continuously differentiable objective function in a computationally cheap way using the so-called backward mode. It is common practice to use the same tools even in the absence of differentiability, although the resulti...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Algorithmic differentiation (AD) tools allow to obtain gradient information
of a continuously differentiable objective function in a computationally cheap
way using the so-called backward mode. It is common practice to use the same
tools even in the absence of differentiability, although the resulting vectors
may not be generalized gradients in the sense of Clarke. The paper at hand
focuses on objectives in which the non-differentiability arises solely from the
evaluation of the absolute value function. In that case, an algebraic condition
based on the evaluation procedure of the objective is identified, that
guarantees that Clarke gradients are correctly computed without requiring any
modifications of the AD tool in question. The analysis allows to prove that any
standard AD tool is adequate to drive a stochastic generalized gradient descent
method for training a dense neural network with ReLU activations. The same is
true for generalized batch gradients or the full generalized gradient, provided
that the AD tool makes a deterministic and agnostic choice for the derivative
information of the absolute value at 0. |
---|---|
DOI: | 10.48550/arxiv.2407.09639 |