A dimensional acceleration of gradient descent-like methods, using persistent random walkers
Finding a local minimum or maximum of a function is often achieved through the gradient-descent optimization method. For a function in dimension d, the gradient requires to compute at each step d partial derivatives. This method is for instance used in machine-learning, to fit the models parameters...
Gespeichert in:
1. Verfasser: | |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Finding a local minimum or maximum of a function is often achieved through
the gradient-descent optimization method. For a function in dimension d, the
gradient requires to compute at each step d partial derivatives. This method is
for instance used in machine-learning, to fit the models parameters so as to
minimize the error rate on a given data set, or in theoretical chemistry, to
obtain molecular conformation. Since each step requires to obtain d partial
derivatives, it can quickly become time-consuming when d grows and when each
computation of the function is complex. If the computation time of the function
to be optimized is the limiting factor, the convergence process can be
optimized using persistent random walks. For all the gradient-related method,
we here propose a way to minimize the optimization process by using random
walks instead of gradient computing. Optimization works on the dimensional
aspect of the function and not on the set size: this approach can thus be
combined with algorithm improvement based on the set size such as stochastic
gradient descent. As shown in a previous publication, the random walk can be
further optimized with persistence. We here detail the method principle, show
an estimate of the acceleration factor and check numerically that this
estimation is valid for quadratic functions. |
---|---|
DOI: | 10.48550/arxiv.1801.04532 |