Method of calculating topic corresponding to document in consideration of word similarity
The invention provides a method of calculating a topic corresponding to a document in consideration of word similarity. The method includes the steps of constructing a topic word knowledge base according to a known topic and the topic word distribution of the known topic; initializing a topic-word m...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Patent |
Sprache: | chi ; eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The invention provides a method of calculating a topic corresponding to a document in consideration of word similarity. The method includes the steps of constructing a topic word knowledge base according to a known topic and the topic word distribution of the known topic; initializing a topic-word matrix and a document-topic matrix of the document of the to-be calculated topic; obtaining similarity between words included in the document according to the topic word knowledge base, using the similarity between the words to conduct iterative update on the topic-word matrix and the document-topicmatrix, and stopping the calculation until the two matrices reach the convergence precision to obtain the topic corresponding to the document of the to-be calculated topic. The method utilizes a non-negative matrix factorization technique to automatically calculate document topics in batches. In the calculation process, the semantic similarity of the words and the document category information areintegrated to improve the |
---|