Gradient Mask: Lateral Inhibition Mechanism Improves Performance in Artificial Neural Networks
Lateral inhibitory connections have been observed in the cortex of the biological brain, and has been extensively studied in terms of its role in cognitive functions. However, in the vanilla version of backpropagation in deep learning, all gradients (which can be understood to comprise of both signa...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Lateral inhibitory connections have been observed in the cortex of the
biological brain, and has been extensively studied in terms of its role in
cognitive functions. However, in the vanilla version of backpropagation in deep
learning, all gradients (which can be understood to comprise of both signal and
noise gradients) flow through the network during weight updates. This may lead
to overfitting. In this work, inspired by biological lateral inhibition, we
propose Gradient Mask, which effectively filters out noise gradients in the
process of backpropagation. This allows the learned feature information to be
more intensively stored in the network while filtering out noisy or unimportant
features. Furthermore, we demonstrate analytically how lateral inhibition in
artificial neural networks improves the quality of propagated gradients. A new
criterion for gradient quality is proposed which can be used as a measure
during training of various convolutional neural networks (CNNs). Finally, we
conduct several different experiments to study how Gradient Mask improves the
performance of the network both quantitatively and qualitatively.
Quantitatively, accuracy in the original CNN architecture, accuracy after
pruning, and accuracy after adversarial attacks have shown improvements.
Qualitatively, the CNN trained using Gradient Mask has developed saliency maps
that focus primarily on the object of interest, which is useful for data
augmentation and network interpretability. |
---|---|
DOI: | 10.48550/arxiv.2208.06918 |