Numerical Approximation Capacity of Neural Networks with Bounded Parameters: Do Limits Exist, and How Can They Be Measured?
The Universal Approximation Theorem posits that neural networks can theoretically possess unlimited approximation capacity with a suitable activation function and a freely chosen or trained set of parameters. However, a more practical scenario arises when these neural parameters, especially the nonl...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The Universal Approximation Theorem posits that neural networks can
theoretically possess unlimited approximation capacity with a suitable
activation function and a freely chosen or trained set of parameters. However,
a more practical scenario arises when these neural parameters, especially the
nonlinear weights and biases, are bounded. This leads us to question:
\textbf{Does the approximation capacity of a neural network remain universal,
or does it have a limit when the parameters are practically bounded? And if it
has a limit, how can it be measured?}
Our theoretical study indicates that while universal approximation is
theoretically feasible, in practical numerical scenarios, Deep Neural Networks
(DNNs) with any analytic activation functions (such as Tanh and Sigmoid) can
only be approximated by a finite-dimensional vector space under a bounded
nonlinear parameter space (NP space), whether in a continuous or discrete
sense. Based on this study, we introduce the concepts of \textit{$\epsilon$
outer measure} and \textit{Numerical Span Dimension (NSdim)} to quantify the
approximation capacity limit of a family of networks both theoretically and
practically.
Furthermore, drawing on our new theoretical study and adopting a fresh
perspective, we strive to understand the relationship between back-propagation
neural networks and random parameter networks (such as the Extreme Learning
Machine (ELM)) with both finite and infinite width. We also aim to provide
fresh insights into regularization, the trade-off between width and depth,
parameter space, width redundancy, condensation, and other related important
issues. |
---|---|
DOI: | 10.48550/arxiv.2409.16697 |