Text clustering-based subject term extraction method

The invention discloses a subject term extraction method based on text clustering. The method comprises the following steps: performing word segmentation processing on text information; accumulating the interference words to form a disabled word bank, and loading a text segmented word set; calculati...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	YANG ANYIN, XIAO LINYAN
Format:	Patent
Sprache:	chi ; eng
Schlagworte:	CALCULATING COMPUTING COUNTING ELECTRIC DIGITAL DATA PROCESSING HANDLING RECORD CARRIERS PHYSICS PRESENTATION OF DATA RECOGNITION OF DATA RECORD CARRIERS
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The invention discloses a subject term extraction method based on text clustering. The method comprises the following steps: performing word segmentation processing on text information; accumulating the interference words to form a disabled word bank, and loading a text segmented word set; calculating a document word frequency TF and an inverse document word frequency IDF according to the processed word document; creating a new Kmeans model, training each clustering center word frequency and a predicted value thereof, and calculating the similarity between text words by using cosine similarity; outputting a Kmeans clustering result and each clustering set; performing LDA document topic prediction on each clustering set; for weight distribution from documents to words, extracting TOPN topics to from a set Mi; aiming at the set Mi, the text record lexicon after word segmentation colliding with the set Mi; and through multi-party conjoint analysis, an unsupervised learning topic extraction method being beneficia