Clustering method based on big data
The invention discloses a clustering method based on big data. The method comprises the steps of segmenting news D to obtain news S; determining whether the news S is the first news, if yes, establishing a new category based on the news S, if not, establishing a VSM vector model for the news S, and...
Gespeichert in:
Hauptverfasser: | , , , , , , |
---|---|
Format: | Patent |
Sprache: | chi ; eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The invention discloses a clustering method based on big data. The method comprises the steps of segmenting news D to obtain news S; determining whether the news S is the first news, if yes, establishing a new category based on the news S, if not, establishing a VSM vector model for the news S, and calculating the similarity between the news S and all categories of a cluster center; finding a category C with the greatest similarity with the news S, if the similarity between the news S and the category C is greater than a preset threshold, classifying the news S into the category C, and if thesimilarity is less than the preset threshold, establishing a new category based on the news S; calculating a similarity mean M1 of the news S and other news in the category C, calculating a similaritymean M2 of the other news in the category C and the other news in the cluster center, and if the M1 is greater than the M2, updating the news S to be a new cluster center, otherwise maintaining the cluster center unchanged; d |
---|