E-VSM: Novel text representation model to capture contex-based closeness between two text documents
In many applications of Information Retrieval and Text Mining, there is need for an intelligent system to calculate the closeness between two text documents. In this, representation of text document in terms of mathematical object plays vital role. Vector Space Model is most popular method to repres...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Tagungsbericht |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In many applications of Information Retrieval and Text Mining, there is need for an intelligent system to calculate the closeness between two text documents. In this, representation of text document in terms of mathematical object plays vital role. Vector Space Model is most popular method to represent text document in mathematical form but it is lossy, loses ordering of terms in text document in turn the context of it. Existing measures of closeness between two text documents are Cosine Similarity, Euclidean Distance etc. which are efficient but lacks in consideration of context of document. Through this paper we propose E-VSM: Enhanced-Vector Space Model to overcome limitations of original Vector Space Model and new `Density-based Clustering' approach to calculate context-based closeness between two text documents which outperforms state of art in terms of accuracy. Experiments show good results specially when text document to be compared is very much close to a particular region of target text document. |
---|---|
DOI: | 10.1109/ISCO.2013.6481176 |