Learning entity-centric document representations using an entity facet topic model
•We propose the task of entity-centric document representation learning.•We propose a novel Entity Facet Topic Model (EFTM) to learn entity-centric document representations.•We confirm our hypothesis regarding the existence of multiple facets of an entity by analysing the learned entity facets using...
Gespeichert in:
Veröffentlicht in: | Information processing & management 2020-05, Vol.57 (3), p.102216, Article 102216 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | •We propose the task of entity-centric document representation learning.•We propose a novel Entity Facet Topic Model (EFTM) to learn entity-centric document representations.•We confirm our hypothesis regarding the existence of multiple facets of an entity by analysing the learned entity facets using qualitative and quantitative analysis, and identify a effective number of facets per entity.•We demonstrate the effectiveness of EFTM in downstream applications using a multilabel classification task.
Learning semantic representations of documents is essential for various downstream applications, including text classification and information retrieval. Entities, as important sources of information, have been playing a crucial role in assisting latent representations of documents. In this work, we hypothesize that entities are not monolithic concepts; instead they have multiple aspects, and different documents may be discussing different aspects of a given entity. Given that, we argue that from an entity-centric point of view, a document related to multiple entities shall be (a) represented differently for different entities (multiple entity-centric representations), and (b) each entity-centric representation should reflect the specific aspects of the entity discussed in the document.
In this work, we devise the following research questions: (1) Can we confirm that entities have multiple aspects, with different aspects reflected in different documents, (2) can we learn a representation of entity aspects from a collection of documents, and a representation of document based on the multiple entities and their aspects as reflected in the documents, (3) does this novel representation improves algorithm performance in downstream applications, and (4) what is a reasonable number of aspects per entity? To answer these questions we model each entity using multiple aspects (entity facets11To avoid unnecessary ambiguity, we use facet instead of both aspect and facet across this work.), where each entity facet is represented as a mixture of latent topics. Then, given a document associated with multiple entities, we assume multiple entity-centric representations, where each entity-centric representation is a mixture of entity facets for each entity. Finally, a novel graphical model, the Entity Facet Topic Model (EFTM), is proposed in order to learn entity-centric document representations, entity facets, and latent topics.
Through experimentation we confirm that (1) entities |
---|---|
ISSN: | 0306-4573 1873-5371 |
DOI: | 10.1016/j.ipm.2020.102216 |