Text clustering-based subject term extraction method
The invention discloses a subject term extraction method based on text clustering. The method comprises the following steps: performing word segmentation processing on text information; accumulating the interference words to form a disabled word bank, and loading a text segmented word set; calculati...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Patent |
Sprache: | chi ; eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The invention discloses a subject term extraction method based on text clustering. The method comprises the following steps: performing word segmentation processing on text information; accumulating the interference words to form a disabled word bank, and loading a text segmented word set; calculating a document word frequency TF and an inverse document word frequency IDF according to the processed word document; creating a new Kmeans model, training each clustering center word frequency and a predicted value thereof, and calculating the similarity between text words by using cosine similarity; outputting a Kmeans clustering result and each clustering set; performing LDA document topic prediction on each clustering set; for weight distribution from documents to words, extracting TOPN topics to from a set Mi; aiming at the set Mi, the text record lexicon after word segmentation colliding with the set Mi; and through multi-party conjoint analysis, an unsupervised learning topic extraction method being beneficia |
---|