Scalability of OAT

Summary form only given. Mining user access patterns from clickstream data has attracted much attention from the research community. However, the scalability testing of corresponding mining algorithms has been virtually ignored. Memory requirements of these algorithms may be quite large due to the f...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Mizher, J., Dunham, M.H., Lin Lu, Yongqiao Xiao
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Summary form only given. Mining user access patterns from clickstream data has attracted much attention from the research community. However, the scalability testing of corresponding mining algorithms has been virtually ignored. Memory requirements of these algorithms may be quite large due to the fact that in-memory data structures whose size depends on the number and length of patterns is often assumed. Due to the importance of the scalability of algorithms to the usefulness of the Web usage mining (WUM) techniques, we propose two new sampling techniques, continuous and random, which can be applied to static sized test datasets to examine WUM algorithm scalability. We illustrate the usefulness of these scalability approaches by performing scalability tests using the online adaptive traversal (OAT) pattern mining algorithm. These experiments show that indeed the OAT algorithm adjusts to the amount of memory and time requirements grow at a linear rate. This paper has several results: 1. The OAT algorithm is shown to be scalable in both space and time. The time grows at a linear rate, while the space adapts to available memory through compression. 2. Two sampling techniques are presented which facilitate the performance of scalability experiments against fixed size Web logs. 3. The impact of spiders crawling on the Web can have a disastrous impact on programs running to collect WUM statistics and patterns.
ISSN:2161-5322
2161-5330
DOI:10.1109/AICCSA.2005.1387045