A dimensional acceleration of gradient descent-like methods, using persistent random walkers

Finding a local minimum or maximum of a function is often achieved through the gradient-descent optimization method. For a function in dimension d, the gradient requires to compute at each step d partial derivatives. This method is for instance used in machine-learning, to fit the models parameters...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
1. Verfasser:	Tejedor, Vincent
Format:	Artikel
Sprache:	eng
Schlagworte:	Physics - Computational Physics Physics - Statistical Mechanics
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Finding a local minimum or maximum of a function is often achieved through the gradient-descent optimization method. For a function in dimension d, the gradient requires to compute at each step d partial derivatives. This method is for instance used in machine-learning, to fit the models parameters so as to minimize the error rate on a given data set, or in theoretical chemistry, to obtain molecular conformation. Since each step requires to obtain d partial derivatives, it can quickly become time-consuming when d grows and when each computation of the function is complex. If the computation time of the function to be optimized is the limiting factor, the convergence process can be optimized using persistent random walks. For all the gradient-related method, we here propose a way to minimize the optimization process by using random walks instead of gradient computing. Optimization works on the dimensional aspect of the function and not on the set size: this approach can thus be combined with algorithm improvement based on the set size such as stochastic gradient descent. As shown in a previous publication, the random walk can be further optimized with persistence. We here detail the method principle, show an estimate of the acceleration factor and check numerically that this estimation is valid for quadratic functions.
DOI:	10.48550/arxiv.1801.04532