A Novel Corpus of Discourse Structure in Humans and Computers
We present a novel corpus of 445 human- and computer-generated documents, comprising about 27,000 clauses, annotated for semantic clause types and coherence relations that allow for nuanced comparison of artificial and natural discourse modes. The corpus covers both formal and informal discourse, an...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We present a novel corpus of 445 human- and computer-generated documents,
comprising about 27,000 clauses, annotated for semantic clause types and
coherence relations that allow for nuanced comparison of artificial and natural
discourse modes. The corpus covers both formal and informal discourse, and
contains documents generated using fine-tuned GPT-2 (Zellers et al., 2019) and
GPT-3(Brown et al., 2020). We showcase the usefulness of this corpus for
detailed discourse analysis of text generation by providing preliminary
evidence that less numerous, shorter and more often incoherent clause relations
are associated with lower perceived quality of computer-generated narratives
and arguments. |
---|---|
DOI: | 10.48550/arxiv.2111.05940 |