Neural Network Architecture Design: Towards Low-complexity and Scalable Solutions

 Over the past few years, deep neural networks have been at the center of attention in machine learning literature thanks to the advances in computational capabilities of modern graphical processing units (GPUs). This progress has made it possible to train large scale neural networks by using thousa...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
1. Verfasser: M. Javid, Alireza
Format: Dissertation
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung: Over the past few years, deep neural networks have been at the center of attention in machine learning literature thanks to the advances in computational capabilities of modern graphical processing units (GPUs). This progress has made it possible to train large scale neural networks by using thousands, and even millions, of training samples to achieve outstanding estimation accuracy in various applications that were not simply possible before. Besides, the lack of a coherent understanding of neural networks theory has shifted the focus of current machine learning researches from a theoretical view to experimental studies by using clusters of GPU. Therefore, the current deep learning literature is still a novice when it encounters real-world scenarios where the number of training samples is small or the computational resources are limited. In this thesis, we focus on developing new neural network architectures while taking such practical constraints into account.   First, we propose a layer-wise training approach for multilayer neural networks that can guarantee a reduction of the training loss as the network gets deeper. While being computationally efficient, this approach provides us with an estimation of the appropriate size of the network, i.e., the number of neurons and layers. The proposed approach also enjoys a scalable training algorithm, making it attractive for distributed learning scenarios over a network of agents. Second, we focus on designing a deep neural network architecture to handle small data learning regimes, where the number of training samples is limited. To this end, we combine kernel methods and densely connected networks and show its classification capabilities in few-shot learning scenarios. Due to the use of kernel representation, the proposed approach is capable of handling large dimensional samples and feature vectors since the complexity of the training algorithm is mainly determined by the number of samples rather than their dimensions. And third, we solely focus on designing a deep neural network architecture with very-low computational requirements, making it suitable for power-limited applications such as learning on the edge devices. In particular, we use a combination of random weights and ReLU activation functions to achieve an accurate estimation as the network gets deeper.   In the next part of the thesis, we present some applications of the proposed architectures and show how they can contribute to the current machin