G2Basy: A framework to improve the RNN language model and ease overfitting problem

Recurrent neural networks are efficient ways of training language models, and various RNN networks have been proposed to improve performance. However, with the increase of network scales, the overfitting problem becomes more urgent. In this paper, we propose a framework-G2Basy-to speed up the traini...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	PloS one 2021-04, Vol.16 (4), p.e0249820-e0249820, Article 0249820
Hauptverfasser:	Yuwen, Lu, Chen, Shuyu, Yuan, Xiaohan
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Analysis Batch annealing Big Data Biology and Life Sciences Coders Computational linguistics Computer and Information Sciences Datasets Euclidean geometry Experiments Informatics Language Language processing Mathematical analysis Morphology Multidisciplinary Sciences Natural language interfaces Neural networks Optimization Optimization algorithms Performance evaluation Physical Sciences Research and Analysis Methods Science & Technology Science & Technology - Other Topics Simulated annealing Social Sciences Software engineering Software upgrading Statistical analysis Statistics Training Vectors (mathematics) Words (language)
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Recurrent neural networks are efficient ways of training language models, and various RNN networks have been proposed to improve performance. However, with the increase of network scales, the overfitting problem becomes more urgent. In this paper, we propose a framework-G2Basy-to speed up the training process and ease the overfitting problem. Instead of using predefined hyperparameters, we devise a gradient increasing and decreasing technique that changes the parameters training batch size and input dropout simultaneously by a user-defined step size. Together with a pretrained word embedding initialization procedure and the introduction of different optimizers at different learning rates, our framework speeds up the training process dramatically and improves performance compared with a benchmark model of the same scale. For the word embedding initialization, we propose the concept of "artificial features" to describe the characteristics of the obtained word embeddings. We experiment on two of the most often used corpora-the Penn Treebank and WikiText-2 datasets-and both outperform the benchmark results and show potential towards further improvement. Furthermore, our framework shows better results with the larger and more complicated WikiText-2 corpus than with the Penn Treebank. Compared with other state-of-the-art results, we achieve comparable results with network scales hundreds of times smaller and within fewer training epochs.
ISSN:	1932-6203 1932-6203
DOI:	10.1371/journal.pone.0249820