A dimensional acceleration of gradient descent-like methods, using persistent random walkers
Finding a local minimum or maximum of a function is often achieved through the gradient-descent optimization method. For a function in dimension d, the gradient requires to compute at each step d partial derivatives. This method is for instance used in machine-learning, to fit the models parameters...
Gespeichert in:
Veröffentlicht in: | arXiv.org 2018-04 |
---|---|
1. Verfasser: | |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Finding a local minimum or maximum of a function is often achieved through the gradient-descent optimization method. For a function in dimension d, the gradient requires to compute at each step d partial derivatives. This method is for instance used in machine-learning, to fit the models parameters so as to minimize the error rate on a given data set, or in theoretical chemistry, to obtain molecular conformation. Since each step requires to obtain d partial derivatives, it can quickly become time-consuming when d grows and when each computation of the function is complex. If the computation time of the function to be optimized is the limiting factor, the convergence process can be optimized using persistent random walks. For all the gradient-related method, we here propose a way to minimize the optimization process by using random walks instead of gradient computing. Optimization works on the dimensional aspect of the function and not on the set size: this approach can thus be combined with algorithm improvement based on the set size such as stochastic gradient descent. As shown in a previous publication, the random walk can be further optimized with persistence. We here detail the method principle, show an estimate of the acceleration factor and check numerically that this estimation is valid for quadratic functions. |
---|---|
ISSN: | 2331-8422 |