Removing Neurons From Deep Neural Networks Trained With Tabular Data

Deep neural networks bear substantial cloud computational loads and often surpass client devices' capabilities. Research has concentrated on reducing the inference burden of convolutional neural networks processing images. Unstructured pruning, which leads to sparse matrices requiring specializ...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE open journal of the Computer Society 2024, Vol.5, p.542-552
Hauptverfasser:	Klemetti, Antti, Raatikainen, Mikko, Kivimaki, Juhani, Myllyaho, Lalli, Nurminen, Jukka K.
Format:	Artikel
Sprache:	eng
Schlagworte:	Artificial neural networks Computational modeling Computer architecture Cost-efficiency Data models deep learning deep neural network Hardware Inference Large language models Network latency Neural networks Neurons Parameters Performance prediction Predictive models Pruning Sparse matrices Tables (data) tabular DNN Training Unstructured data
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Deep neural networks bear substantial cloud computational loads and often surpass client devices' capabilities. Research has concentrated on reducing the inference burden of convolutional neural networks processing images. Unstructured pruning, which leads to sparse matrices requiring specialized hardware, has been extensively studied. However, neural networks trained with tabular data and structured pruning, which produces dense matrices handled by standard hardware, are less explored. We compare two approaches: 1) Removing neurons followed by training from scratch, and 2) Structured pruning followed by fine-tuning through additional training over a limited number of epochs. We evaluate these approaches using three models of varying sizes (1.5, 9.2, and 118.7 million parameters) from Kaggle-winning neural networks trained with tabular data. Approach 1 consistently outperformed Approach 2 in predictive performance. The models from Approach 1 had 52%, 8%, and 12% fewer parameters than the original models, with latency reductions of 18%, 5%, and 5%, respectively. Approach 2 required at least one epoch of fine-tuning for recovering predictive performance, with further fine-tuning offering diminishing returns. Approach 1 yields lighter models for retraining in the presence of concept drift and avoids shifting computational load from inference to training, which is inherent in Approach 2. However, Approach 2 can be used to pinpoint the layers that have the least impact on the model's predictive performance when neurons are removed. We found that the feed-forward component of the transformer architecture used in large language models is a promising target for neuron removal.
ISSN:	2644-1268 2644-1268
DOI:	10.1109/OJCS.2024.3467182