Assessing the influence of personal preferences on the choice of vocabulary for natural language generation

► Most NLG systems try to find a general way of generating natural language. ► We examine the influence of personal preference in the choice of vocabulary for NLG. ► We use a corpus annotated by several people to test our hypothesis. ► The results show a decrease of 40% in error when personal prefer...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Information processing & management 2013-07, Vol.49 (4), p.817-832
Hauptverfasser: Hervás, Raquel, Francisco, Virginia, Gervás, Pablo
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:► Most NLG systems try to find a general way of generating natural language. ► We examine the influence of personal preference in the choice of vocabulary for NLG. ► We use a corpus annotated by several people to test our hypothesis. ► The results show a decrease of 40% in error when personal preferences are considered. Referring expression generation is the part of natural language generation that decides how to refer to the entities appearing in an automatically generated text. Lexicalization is the part of this process which involves the choice of appropriate vocabulary or expressions to transform the conceptual content of a referring expression into the corresponding text in natural language. This problem presents an important challenge when we have enough knowledge to allow more than one alternative. In those cases, we need some heuristics to decide which alternatives are more appropriate in a given situation. Whereas most work on natural language generation has focused on a generic way of generating language, in this paper we explore personal preferences as a type of heuristic that has not been properly addressed. We empirically analyze the TUNA corpus, a corpus of referring expression lexicalizations, to investigate the influence of language preferences in how people lexicalize new referring expressions in different situations. We then present two corpus-based approaches to solve the problem of referring expression lexicalization, one that takes preferences into account and one that does not. The results show a decrease of 50% in the similarity error against the reference corpus when personal preferences are used to generate the final referring expression.
ISSN:0306-4573
1873-5371
DOI:10.1016/j.ipm.2013.01.006