Text clustering-based subject term extraction method

The invention discloses a subject term extraction method based on text clustering. The method comprises the following steps: performing word segmentation processing on text information; accumulating the interference words to form a disabled word bank, and loading a text segmented word set; calculati...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: YANG ANYIN, XIAO LINYAN
Format: Patent
Sprache:chi ; eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The invention discloses a subject term extraction method based on text clustering. The method comprises the following steps: performing word segmentation processing on text information; accumulating the interference words to form a disabled word bank, and loading a text segmented word set; calculating a document word frequency TF and an inverse document word frequency IDF according to the processed word document; creating a new Kmeans model, training each clustering center word frequency and a predicted value thereof, and calculating the similarity between text words by using cosine similarity; outputting a Kmeans clustering result and each clustering set; performing LDA document topic prediction on each clustering set; for weight distribution from documents to words, extracting TOPN topics to from a set Mi; aiming at the set Mi, the text record lexicon after word segmentation colliding with the set Mi; and through multi-party conjoint analysis, an unsupervised learning topic extraction method being beneficia