A consensus-based global optimization method with adaptive momentum estimation

Objective functions in large-scale machine-learning and artificial intelligence applications often live in high dimensions with strong non-convexity and massive local minima. First-order methods, such as the stochastic gradient method and Adam, are often used to find global minima. Recently, the con...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2020-12
Hauptverfasser:	Chen, Jingrun, Shi, Jin, Lyu, Liyao
Format:	Artikel
Sprache:	eng
Schlagworte:	Artificial intelligence Asymptotic methods Asymptotic properties Brownian motion Cognitive tasks Convergence Convexity Global optimization Machine learning Minima Momentum Optimization Parameters Partial differential equations Regularity Stability analysis
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Objective functions in large-scale machine-learning and artificial intelligence applications often live in high dimensions with strong non-convexity and massive local minima. First-order methods, such as the stochastic gradient method and Adam, are often used to find global minima. Recently, the consensus-based optimization (CBO) method has been introduced as one of the gradient-free optimization methods and its convergence is proven with dimension-dependent parameters, which may suffer from the curse of dimensionality. By replacing the isotropic geometric Brownian motion with the component-wise one, the latest improvement of the CBO method is guaranteed to converge to the global minimizer with dimension-independent parameters, although the initial data need to be well-chosen. In this paper, based on the CBO method and Adam, we propose a consensus-based global optimization method with adaptive momentum estimation (Adam-CBO). Advantages of the Adam-CBO method include: (1) capable of finding global minima of non-convex objective functions with high success rates and low costs; (2) can handle non-differentiable activation functions and thus approximate low-regularity functions with better accuracy. The former is verified by approximating the \(1000\) dimensional Rastrigin function with \(100\%\) success rate at a cost only growing linearly with respect to the dimensionality. The latter is confirmed by solving a machine learning task for partial differential equations with low-regularity solutions where the Adam-CBO method provides better results than the state-of-the-art method Adam. A linear stability analysis is provided to understand the asymptotic behavior of the Adam-CBO method.
ISSN:	2331-8422