Semantic-based topic representation using frequent semantic patterns

Topic modeling discovers the hidden topics in a document collection. Most of the existing topic models focus only on word usage and generate the topics based on the word frequency and co-occurrence without considering the meaning of the text. In this paper, we propose a novel approach to generate a...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Knowledge-based systems 2021-03, Vol.216, p.106808, Article 106808
Hauptverfasser: Kapugama Geeganage, Dakshi T., Xu, Yue, Li, Yuefeng
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Topic modeling discovers the hidden topics in a document collection. Most of the existing topic models focus only on word usage and generate the topics based on the word frequency and co-occurrence without considering the meaning of the text. In this paper, we propose a novel approach to generate a semantic pattern-based topic representation based on the meaning of the text to represent the topics in a document collection. The proposed approach considers both the semantics and co-occurrence of words to generate a set of frequent semantic patterns to represent each topic. The semantics are captured by matching the words in each topic with concepts in the Probase ontology. A set of frequent semantic patterns in each topic is generated based on the co-occurrence of the matched words to represent the topic. Hence, our approach differs from traditional topic models because of the meaningful frequent semantic patterns generated based on the ontology. The proposed topic representation was evaluated in terms of topic quality and information filtering performance against a set of state-of-the-art systems. Perplexity, coherence, and topic word distribution were examined in the topic quality evaluation. The generated frequent semantic patterns were used as features for the information filtering evaluation. Our topic representation outperformed in all the evaluations.
ISSN:0950-7051
1872-7409
DOI:10.1016/j.knosys.2021.106808