Webpage clustering method and system based on BAMIC multi-example algorithm and storage medium

The invention relates to the field of clustering analysis, and discloses a webpage clustering method and system based on a BAMIC multi-example algorithm and a storage medium, and the method comprises the steps: collecting webpage text data, and carrying out the preprocessing of the webpage text data...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
1. Verfasser: CAI CANHONG
Format: Patent
Sprache:chi ; eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The invention relates to the field of clustering analysis, and discloses a webpage clustering method and system based on a BAMIC multi-example algorithm and a storage medium, and the method comprises the steps: collecting webpage text data, and carrying out the preprocessing of the webpage text data; k pieces of webpage text data are selected from the multiple pieces of webpage text data to serve as cluster centers; using a Hausdorff distance based on an OWA operator to calculate the distance between the rest webpage text data and each cluster center, and distributing the rest webpage text data to the nearest cluster to form a new cluster; calculating the distance between the webpage text data in the new cluster, and determining the center of the new cluster; repeatedly obtaining a plurality of clustering division results, wherein each clustering division result comprises n cluster centers and n clusters; and evaluating a result of each cluster division to obtain an optimal cluster number. According to the me