Clustering method based on big data

The invention discloses a clustering method based on big data. The method comprises the steps of segmenting news D to obtain news S; determining whether the news S is the first news, if yes, establishing a new category based on the news S, if not, establishing a VSM vector model for the news S, and...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: MA XIAOXIAO, WEN DACHUAN, WU CHUNCAI, WEN BIN, YAO QINGLIN, YANG SHUHAI, FENG LIANGHUAI
Format: Patent
Sprache:chi ; eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The invention discloses a clustering method based on big data. The method comprises the steps of segmenting news D to obtain news S; determining whether the news S is the first news, if yes, establishing a new category based on the news S, if not, establishing a VSM vector model for the news S, and calculating the similarity between the news S and all categories of a cluster center; finding a category C with the greatest similarity with the news S, if the similarity between the news S and the category C is greater than a preset threshold, classifying the news S into the category C, and if thesimilarity is less than the preset threshold, establishing a new category based on the news S; calculating a similarity mean M1 of the news S and other news in the category C, calculating a similaritymean M2 of the other news in the category C and the other news in the cluster center, and if the M1 is greater than the M2, updating the news S to be a new cluster center, otherwise maintaining the cluster center unchanged; d