It all starts with entities: A Salient entity topic model

Entities play an essential role in understanding textual documents, regardless of whether the documents are short, such as tweets, or long, such as news articles. In short textual documents, all entities mentioned are usually considered equally important because of the limited amount of information....

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Natural language engineering 2020-09, Vol.26 (5), p.531-549
Hauptverfasser: Wu, Chuan, Kanoulas, Evangelos, de Rijke, Maarten
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Entities play an essential role in understanding textual documents, regardless of whether the documents are short, such as tweets, or long, such as news articles. In short textual documents, all entities mentioned are usually considered equally important because of the limited amount of information. In long textual documents, however, not all entities are equally important: some are salient and others are not. Traditional entity topic models (ETMs) focus on ways to incorporate entity information into topic models to better explain the generative process of documents. However, entities are usually treated equally, without considering whether they are salient or not. In this work, we propose a novel ETM, Salient Entity Topic Model, to take salient entities into consideration in the document generation process. In particular, we model salient entities as a source of topics used to generate words in documents, in addition to the topic distribution of documents used in traditional topic models. Qualitative and quantitative analysis is performed on the proposed model. Application to entity salience detection demonstrates the effectiveness of our model compared to the state-of-the-art topic model baselines.
ISSN:1351-3249
1469-8110
DOI:10.1017/S1351324919000585