A Text Similarity Measurement Method Based on Singular Value Decomposition and Semantic Relevance

The traditional text similarity measurement methods based on word frequency vector ignore the semantic relationships between words, which has become the obstacle to text similarity calculation, together with the high-dimensionality and sparsity of document vector. To address the problems, the improv...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	JIPS(Journal of Information Processing Systems) 2017-08, Vol.13 (4), p.863-875
Hauptverfasser:	Li, Xu, Yao, Chunlong, Fan, Fenglong, Yu, Xiaoqiang
Format:	Artikel
Sprache:	kor
Schlagworte:	Natural Language Processing Semantic Relevance Singular Value Decomposition Text Representation Text Similarity Measurement
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The traditional text similarity measurement methods based on word frequency vector ignore the semantic relationships between words, which has become the obstacle to text similarity calculation, together with the high-dimensionality and sparsity of document vector. To address the problems, the improved singular value decomposition is used to reduce dimensionality and remove noises of the text representation model. The optimal number of singular values is analyzed and the semantic relevance between words can be calculated in constructed semantic space. An inverted index construction algorithm and the similarity definitions between vectors are proposed to calculate the similarity between two documents on the semantic level. The experimental results on benchmark corpus demonstrate that the proposed method promotes the evaluation metrics of F-measure.
ISSN:	1976-913X 2092-805X