RuCoLA benchmark

Russian Corpus of Linguistic Acceptability The Russian Corpus of Linguistic Acceptability (RuCoLA) is a dataset consisting of Russian language sentences with their binary acceptability judgements. It includes expert-written sentences from linguistic publications and machine-generated examples. The c...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Mikhailov, Vladislav, Shamardina, Tatiana, Ryabinin, Max, Pestova, Alena, Smurov, Ivan, Artemova, Ekaterina
Format:	Dataset
Sprache:	rus
Schlagworte:	linguistic acceptability
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Russian Corpus of Linguistic Acceptability The Russian Corpus of Linguistic Acceptability (RuCoLA) is a dataset consisting of Russian language sentences with their binary acceptability judgements. It includes expert-written sentences from linguistic publications and machine-generated examples. The corpus covers a variety of language phenomena, ranging from syntax and semantics to generative model hallucinations. We release RuCoLA to facilitate the development of methods for identifying errors in natural language and create a public leaderboard to track the progress made on this problem. About In recent years, natural language processing systems have rapidly improved in quality for a number of tasks, many of which involve concepts as difficult as common sense or even general world knowledge. This trend was enabled by the emergence of large-scale self-supervised pretraining methods that formed the backbone of mainstream language models like BERT or GPT-3. Such models have surpassed human performance on canonical NLU benchmarks and proved capable of generating texts hardly distinguishable from those written by humans. Despite these impressive results, modern language models are still far from perfect, particularly for the Russian language. Although passages from generative models may seem human-like at first glance, they tend to be rife with hallucinated facts or contradictory information. Furthermore, a growing number of studies have reported that even the largest language models do not properly capture various linguistic phenomena and have limited ability to make fine-grained judgments about the correct use of language. With that in mind, we designed RuCoLA as a benchmark for evaluating the linguistic competence of Russian language models. RuCoLA follows the general concept of linguistic acceptability: unlike grammatical correctness, which relates to the structure of language, acceptability denotes whether the utterance would be considered natural by a native speaker. Thus, a grammatical sentence can be unacceptable (e.g., “Colorless green ideas sleep furiously”), but an acceptable sentence has to be grammatical. Similarly to GLUE-style and probing benchmarks (e.g., GLUE, Russian SuperGLUE and RuSentEval), it can be used to compare the general language understanding capabilities of neural networks or to analyze and improve the fluency and consistency of text generation models.
DOI:	10.5281/zenodo.6560846