Understanding the Distributions of Aggregation Layers in Deep Neural Networks

The process of aggregation is ubiquitous in almost all the deep nets' models. It functions as an important mechanism for consolidating deep features into a more compact representation while increasing the robustness to overfitting and providing spatial invariance in deep nets. In particular, th...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transaction on neural networks and learning systems 2024-04, Vol.35 (4), p.5536-5550
Hauptverfasser:	Ong, Eng-Jon, Husain, Sameed, Bober, Miroslaw
Format:	Artikel
Sprache:	eng
Schlagworte:	Activation probability distributions Aggregation aggregation layers Artificial neural networks Convolutional neural networks Deep learning deep neural networks (DNNs) Empirical analysis Gamma distribution Information theory Mathematical models Nets Neural networks Probability distribution Probability theory Robustness (mathematics) Tensors
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The process of aggregation is ubiquitous in almost all the deep nets' models. It functions as an important mechanism for consolidating deep features into a more compact representation while increasing the robustness to overfitting and providing spatial invariance in deep nets. In particular, the proximity of global aggregation layers to the output layers of DNNs means that aggregated features directly influence the performance of a deep net. A better understanding of this relationship can be obtained using information theoretic methods. However, this requires knowledge of the distributions of the activations of aggregation layers. To achieve this, we propose a novel mathematical formulation for analytically modeling the probability distributions of output values of layers involved with deep feature aggregation. An important outcome is our ability to analytically predict the Kullback-Leibler (KL)-divergence of output nodes in a DNN. We also experimentally verify our theoretical predictions against empirical observations across a broad range of different classification tasks and datasets.
ISSN:	2162-237X 2162-2388
DOI:	10.1109/TNNLS.2022.3207790