Professional field-oriented on-line theme detection method

The invention discloses a professional field-oriented on-line theme detection method. The method comprises the following steps: obtaining a text vector matrix of a preprocessed text set, and extracting a dictionary from the text set; modeling the text vector matrix; calculating a mixed weight p (the...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: WANG JIANRONG, YUAN XUYING, YU JIAN, YU MEI, GAO JIE, XIN WEI
Format: Patent
Sprache:chi ; eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The invention discloses a professional field-oriented on-line theme detection method. The method comprises the following steps: obtaining a text vector matrix of a preprocessed text set, and extracting a dictionary from the text set; modeling the text vector matrix; calculating a mixed weight p (thetak|d) from a text d to a theme thetak and a frequency p (w|thetak) that a feature word appears in each theme thetak; obtaining the similarity between two texts di and dj, defining a theme model-based theme distance between the texts into a relative entropy distance of a text vector, and calculating a similarity matrix; compressing the text set, thus obtaining a new text sample sect; calculating a similarity matrix of the new text sample set, and selecting a deviation parameter p according to the similarity matrix; combining clustering results, thus generating a new clustering result; calculating distances between all texts in the original text set and compressed classified texts, and performing classification; out