K-tree: Large Scale Document Clustering

We introduce K-tree in an information retrieval context. It is an efficient approximation of the k-means clustering algorithm. Unlike k-means it forms a hierarchy of clusters. It has been extended to address issues with sparse representations. We compare performance and quality to CLUTO using docume...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2010-01
Hauptverfasser:	De Vries, Christopher M, Geva, Shlomo
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Cluster analysis Clustering Computer Science - Artificial Intelligence Computer Science - Data Structures and Algorithms Computer Science - Information Retrieval Information retrieval Vector quantization
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	We introduce K-tree in an information retrieval context. It is an efficient approximation of the k-means clustering algorithm. Unlike k-means it forms a hierarchy of clusters. It has been extended to address issues with sparse representations. We compare performance and quality to CLUTO using document collections. The K-tree has a low time complexity that is suitable for large document collections. This tree structure allows for efficient disk based implementations where space requirements exceed that of main memory.
ISSN:	2331-8422
DOI:	10.48550/arxiv.1001.0830