Mask More and Mask Later: Efficient Pre-training of Masked Language Models by Disentangling the [MASK] Token

The pre-training of masked language models (MLMs) consumes massive computation to achieve good results on downstream NLP tasks, resulting in a large carbon footprint. In the vanilla MLM, the virtual tokens, [MASK]s, act as placeholders and gather the contextualized information from unmasked tokens t...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2022-11
Hauptverfasser:	Liao, Baohao, Thulke, David, Hewavitharana, Sanjika, Ney, Hermann, Monz, Christof
Format:	Artikel
Sprache:	eng
Schlagworte:	Budgets Downstream effects Training
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Schreiben Sie den ersten Kommentar!