A Text Similarity Measurement Method Based on Singular Value Decomposition and Semantic Relevance

The traditional text similarity measurement methods based on word frequency vector ignore the semanticrelationships between words, which has become the obstacle to text similarity calculation, together with thehigh-dimensionality and sparsity of document vector. To address the problems, the improved...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of information processing systems 2017, 13(4), 46, pp.863-875
Hauptverfasser: Li, Xu, Yao, Chunlong, Fan, Fenglong, Yu, Xiaoqiang
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The traditional text similarity measurement methods based on word frequency vector ignore the semanticrelationships between words, which has become the obstacle to text similarity calculation, together with thehigh-dimensionality and sparsity of document vector. To address the problems, the improved singular valuedecomposition is used to reduce dimensionality and remove noises of the text representation model. Theoptimal number of singular values is analyzed and the semantic relevance between words can be calculated inconstructed semantic space. An inverted index construction algorithm and the similarity definitions betweenvectors are proposed to calculate the similarity between two documents on the semantic level. Theexperimental results on benchmark corpus demonstrate that the proposed method promotes the evaluationmetrics of F-measure. KCI Citation Count: 1
ISSN:2092-805X
1976-913X
2092-805X
DOI:10.3745/JIPS.02.0067