Asymptotic statistics for multilayer perceptron with ReLU hidden units
In numerous tasks, deep networks are state of the art. However, they are still not well understood from a statistical point of view. In this article, we try to contribute to filling this gap, and we consider regression models involving deep multilayer perceptrons (MLP) with rectified linear (ReLU) f...
Gespeichert in:
Veröffentlicht in: | Neurocomputing (Amsterdam) 2019-05, Vol.342, p.16-23 |
---|---|
1. Verfasser: | |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In numerous tasks, deep networks are state of the art. However, they are still not well understood from a statistical point of view. In this article, we try to contribute to filling this gap, and we consider regression models involving deep multilayer perceptrons (MLP) with rectified linear (ReLU) functions for activation units. It is a difficult task to study the statistical properties of such models. The main reason is that in practice these models may be heavily overparameterized. For the sake of simplicity, we focus here on the sum of square errors (SSE) cost function which is the standard cost function for regression purposes. In this framework, we study the asymptotic behavior of the difference between the SSE of estimated models and the SSE of the theoretical best model. This behavior gives us information on the overfitting properties of such models. We use in this paper new methodology introduced to deal with models with a loss of identifiability, i.e. in the case that the true parameter cannot be identified uniquely. Hence, we don’t have to assume that a unique parameter vector realizes the best regression function which seems to be a too strong assumption for heavily overparameterized models. Our results shed new light on the overfitting behavior of MLP models. |
---|---|
ISSN: | 0925-2312 1872-8286 |
DOI: | 10.1016/j.neucom.2018.11.097 |