Method of calculating topic corresponding to document in consideration of word similarity

The invention provides a method of calculating a topic corresponding to a document in consideration of word similarity. The method includes the steps of constructing a topic word knowledge base according to a known topic and the topic word distribution of the known topic; initializing a topic-word m...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: CHU TIANBAO, JIA XIRUI
Format: Patent
Sprache:chi ; eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The invention provides a method of calculating a topic corresponding to a document in consideration of word similarity. The method includes the steps of constructing a topic word knowledge base according to a known topic and the topic word distribution of the known topic; initializing a topic-word matrix and a document-topic matrix of the document of the to-be calculated topic; obtaining similarity between words included in the document according to the topic word knowledge base, using the similarity between the words to conduct iterative update on the topic-word matrix and the document-topicmatrix, and stopping the calculation until the two matrices reach the convergence precision to obtain the topic corresponding to the document of the to-be calculated topic. The method utilizes a non-negative matrix factorization technique to automatically calculate document topics in batches. In the calculation process, the semantic similarity of the words and the document category information areintegrated to improve the