GLGE: A New General Language Generation Evaluation Benchmark
Multi-task benchmarks such as GLUE and SuperGLUE have driven great progress of pretraining and transfer learning in Natural Language Processing (NLP). These benchmarks mostly focus on a range of Natural Language Understanding (NLU) tasks, without considering the Natural Language Generation (NLG) mod...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Multi-task benchmarks such as GLUE and SuperGLUE have driven great progress
of pretraining and transfer learning in Natural Language Processing (NLP).
These benchmarks mostly focus on a range of Natural Language Understanding
(NLU) tasks, without considering the Natural Language Generation (NLG) models.
In this paper, we present the General Language Generation Evaluation (GLGE), a
new multi-task benchmark for evaluating the generalization capabilities of NLG
models across eight language generation tasks. For each task, we continue to
design three subtasks in terms of task difficulty (GLGE-Easy, GLGE-Medium, and
GLGE-Hard). This introduces 24 subtasks to comprehensively compare model
performance. To encourage research on pretraining and transfer learning on NLG
models, we make GLGE publicly available and build a leaderboard with strong
baselines including MASS, BART, and ProphetNet (The source code and dataset are
publicly available at https://github.com/microsoft/glge). |
---|---|
DOI: | 10.48550/arxiv.2011.11928 |