Kullback-Leibler Divergence-Based Regularized Normalization for Low-Resource Tasks

Large pretrained models, like BERT, GPT, and Wav2Vec, have demonstrated their ability to learn transferable representations for various downstream tasks. However, obtaining a substantial amount of supervised data remains a challenge due to resource and time limitations. As a solution, researchers ha...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on artificial intelligence 2024-06, Vol.5 (6), p.2638-2650
Hauptverfasser:	Kumar, Neeraj, Narang, Ankur, Lall, Brejesh
Format:	Artikel
Sprache:	eng
Schlagworte:	Adaptation models Artificial intelligence Data models Kullback–Leibler (KL) regularization large pretrained model low-resource tasks Mathematical models normalization Task analysis Training Tuning
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Large pretrained models, like BERT, GPT, and Wav2Vec, have demonstrated their ability to learn transferable representations for various downstream tasks. However, obtaining a substantial amount of supervised data remains a challenge due to resource and time limitations. As a solution, researchers have turned their attention to using large pretrained datasets via techniques like fine tuning, linear probing, or prompt tuning in low-resource settings. Normalization techniques play a crucial role in speeding up training, style transfer, object detection, recurrent neural networks, and improving the generalization of deep neural networks. Despite their success in various domains, their effectiveness in low-resource NLP and speech tasks has been limited. A notable reason for this limitation is the difficulty in capturing expressiveness using affine parameters of normalization. To address this issue, we propose a novel approach called Kullback-Leibler (KL) regularized normalization or KL-Norm. The main objective of KL-Norm is to ensure that normalized data are well-behaved and to improve generalization by reducing overfitting by including a regularization loss function in the training process. It achieves this by promoting good performance on out-of-domain distributions and effectively filtering relevant features while eliminating superficial features or biases present in the dataset or pretrained model. Remarkably, KL-Norm accomplishes these objectives with minimal increase in model parameters and memory overheads. Through extensive experimental analysis, we showcase the improved accuracy and performance of KL-Norm in comparison to other normalization techniques on low-resource downstream NLP tasks. These tasks encompass a wide range of applications, including sentiment classification, semantic relationship characterization, semantic textual similarity, textual entailment, and paraphrase detection. Additionally, KL-Norm exhibits superior results in downstream speech tasks, specifically in keyword detection and emotion classification.
ISSN:	2691-4581 2691-4581
DOI:	10.1109/TAI.2023.3323918