A Text Similarity Measurement Method Based on Singular Value Decomposition and Semantic Relevance
The traditional text similarity measurement methods based on word frequency vector ignore the semanticrelationships between words, which has become the obstacle to text similarity calculation, together with thehigh-dimensionality and sparsity of document vector. To address the problems, the improved...
Gespeichert in:
Veröffentlicht in: | Journal of information processing systems 2017, 13(4), 46, pp.863-875 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The traditional text similarity measurement methods based on word frequency vector ignore the semanticrelationships between words, which has become the obstacle to text similarity calculation, together with thehigh-dimensionality and sparsity of document vector. To address the problems, the improved singular valuedecomposition is used to reduce dimensionality and remove noises of the text representation model. Theoptimal number of singular values is analyzed and the semantic relevance between words can be calculated inconstructed semantic space. An inverted index construction algorithm and the similarity definitions betweenvectors are proposed to calculate the similarity between two documents on the semantic level. Theexperimental results on benchmark corpus demonstrate that the proposed method promotes the evaluationmetrics of F-measure. KCI Citation Count: 1 |
---|---|
ISSN: | 2092-805X 1976-913X 2092-805X |
DOI: | 10.3745/JIPS.02.0067 |