Webpage clustering method and system based on BAMIC multi-example algorithm and storage medium
The invention relates to the field of clustering analysis, and discloses a webpage clustering method and system based on a BAMIC multi-example algorithm and a storage medium, and the method comprises the steps: collecting webpage text data, and carrying out the preprocessing of the webpage text data...
Gespeichert in:
1. Verfasser: | |
---|---|
Format: | Patent |
Sprache: | chi ; eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The invention relates to the field of clustering analysis, and discloses a webpage clustering method and system based on a BAMIC multi-example algorithm and a storage medium, and the method comprises the steps: collecting webpage text data, and carrying out the preprocessing of the webpage text data; k pieces of webpage text data are selected from the multiple pieces of webpage text data to serve as cluster centers; using a Hausdorff distance based on an OWA operator to calculate the distance between the rest webpage text data and each cluster center, and distributing the rest webpage text data to the nearest cluster to form a new cluster; calculating the distance between the webpage text data in the new cluster, and determining the center of the new cluster; repeatedly obtaining a plurality of clustering division results, wherein each clustering division result comprises n cluster centers and n clusters; and evaluating a result of each cluster division to obtain an optimal cluster number. According to the me |
---|