Concept based document similarity using graph model
To address the process of document similarity, ontology based knowledge base such as WordNet and Wikipedia is used widely. However, there are still available different challenges, such as polysemy, synonym and high dimensionality. In this paper, a novel method for calculating the similarity of text...
Gespeichert in:
Veröffentlicht in: | International journal of information technology (Singapore. Online) 2022-02, Vol.14 (1), p.311-322 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | To address the process of document similarity, ontology based knowledge base such as WordNet and Wikipedia is used widely. However, there are still available different challenges, such as polysemy, synonym and high dimensionality. In this paper, a novel method for calculating the similarity of text documents is proposed. The proposed system exploits ontological framework to give correct assessment of the similarity between terms. A modified method for concepts extraction using WordNet and Wikipedia is proposed in this paper. Text document is represented as a conceptual coexistence graph. Index is constructed to handle scalability and easy computation based on large concepts and terms association. Graph similarity is calculated using vertex similarity. The integrated approach can find theme of documents based on disambiguated and extracted concepts. The experimental has been evaluated on 20 newsgroup dataset and self-generated datasets. Results show that our approach significantly improved compared to bag of words approach. |
---|---|
ISSN: | 2511-2104 2511-2112 |
DOI: | 10.1007/s41870-019-00314-w |