Clustering method based on big data

The invention discloses a clustering method based on big data. The method comprises the steps of segmenting news D to obtain news S; determining whether the news S is the first news, if yes, establishing a new category based on the news S, if not, establishing a VSM vector model for the news S, and...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	MA XIAOXIAO, WEN DACHUAN, WU CHUNCAI, WEN BIN, YAO QINGLIN, YANG SHUHAI, FENG LIANGHUAI
Format:	Patent
Sprache:	chi ; eng
Schlagworte:	CALCULATING COMPUTING COUNTING ELECTRIC DIGITAL DATA PROCESSING PHYSICS
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The invention discloses a clustering method based on big data. The method comprises the steps of segmenting news D to obtain news S; determining whether the news S is the first news, if yes, establishing a new category based on the news S, if not, establishing a VSM vector model for the news S, and calculating the similarity between the news S and all categories of a cluster center; finding a category C with the greatest similarity with the news S, if the similarity between the news S and the category C is greater than a preset threshold, classifying the news S into the category C, and if thesimilarity is less than the preset threshold, establishing a new category based on the news S; calculating a similarity mean M1 of the news S and other news in the category C, calculating a similaritymean M2 of the other news in the category C and the other news in the cluster center, and if the M1 is greater than the M2, updating the news S to be a new cluster center, otherwise maintaining the cluster center unchanged; d