Professional field-oriented on-line theme detection method
The invention discloses a professional field-oriented on-line theme detection method. The method comprises the following steps: obtaining a text vector matrix of a preprocessed text set, and extracting a dictionary from the text set; modeling the text vector matrix; calculating a mixed weight p (the...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Patent |
Sprache: | chi ; eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The invention discloses a professional field-oriented on-line theme detection method. The method comprises the following steps: obtaining a text vector matrix of a preprocessed text set, and extracting a dictionary from the text set; modeling the text vector matrix; calculating a mixed weight p (thetak|d) from a text d to a theme thetak and a frequency p (w|thetak) that a feature word appears in each theme thetak; obtaining the similarity between two texts di and dj, defining a theme model-based theme distance between the texts into a relative entropy distance of a text vector, and calculating a similarity matrix; compressing the text set, thus obtaining a new text sample sect; calculating a similarity matrix of the new text sample set, and selecting a deviation parameter p according to the similarity matrix; combining clustering results, thus generating a new clustering result; calculating distances between all texts in the original text set and compressed classified texts, and performing classification; out |
---|