CERD: A Comprehensive Chinese Rhetoric Dataset for Rhetorical Understanding and Generation in Essays
Existing rhetorical understanding and generation datasets or corpora primarily focus on single coarse-grained categories or fine-grained categories, neglecting the common interrelations between different rhetorical devices by treating them as independent sub-tasks. In this paper, we propose the Chin...
Gespeichert in:
Hauptverfasser: | , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Existing rhetorical understanding and generation datasets or corpora
primarily focus on single coarse-grained categories or fine-grained categories,
neglecting the common interrelations between different rhetorical devices by
treating them as independent sub-tasks. In this paper, we propose the Chinese
Essay Rhetoric Dataset (CERD), consisting of 4 commonly used coarse-grained
categories including metaphor, personification, hyperbole and parallelism and
23 fine-grained categories across both form and content levels. CERD is a
manually annotated and comprehensive Chinese rhetoric dataset with five
interrelated sub-tasks. Unlike previous work, our dataset aids in understanding
various rhetorical devices, recognizing corresponding rhetorical components,
and generating rhetorical sentences under given conditions, thereby improving
the author's writing proficiency and language usage skills. Extensive
experiments are conducted to demonstrate the interrelations between multiple
tasks in CERD, as well as to establish a benchmark for future research on
rhetoric. The experimental results indicate that Large Language Models achieve
the best performance across most tasks, and jointly fine-tuning with multiple
tasks further enhances performance. |
---|---|
DOI: | 10.48550/arxiv.2409.19691 |