Clustering by previous representative

A method may include identifying documents in a current clustering operation, assigning the identified documents to one or more clusters, selecting a current representative document for each of the one or more clusters, determining whether the current representative document has been re-crawled, det...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Kupke, Joachim, Proudfoot, David Michael
Format: Patent
Sprache:eng
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:A method may include identifying documents in a current clustering operation, assigning the identified documents to one or more clusters, selecting a current representative document for each of the one or more clusters, determining whether the current representative document has been re-crawled, determining a previous representative document with which the current representative document was previously associated in a prior clustering operation, if it is determined that the current representative document has not been re-crawled, determining one of the one or more clusters to which the previous representative document has been assigned in the current clustering operation, combining one of the one or more clusters associated with the current representative document that has not been re-crawled with the one of the one or more clusters associated with the previous representative document into a combined cluster, and storing information regarding the combined cluster.