Scalability of OAT
Summary form only given. Mining user access patterns from clickstream data has attracted much attention from the research community. However, the scalability testing of corresponding mining algorithms has been virtually ignored. Memory requirements of these algorithms may be quite large due to the f...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Tagungsbericht |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Summary form only given. Mining user access patterns from clickstream data has attracted much attention from the research community. However, the scalability testing of corresponding mining algorithms has been virtually ignored. Memory requirements of these algorithms may be quite large due to the fact that in-memory data structures whose size depends on the number and length of patterns is often assumed. Due to the importance of the scalability of algorithms to the usefulness of the Web usage mining (WUM) techniques, we propose two new sampling techniques, continuous and random, which can be applied to static sized test datasets to examine WUM algorithm scalability. We illustrate the usefulness of these scalability approaches by performing scalability tests using the online adaptive traversal (OAT) pattern mining algorithm. These experiments show that indeed the OAT algorithm adjusts to the amount of memory and time requirements grow at a linear rate. This paper has several results: 1. The OAT algorithm is shown to be scalable in both space and time. The time grows at a linear rate, while the space adapts to available memory through compression. 2. Two sampling techniques are presented which facilitate the performance of scalability experiments against fixed size Web logs. 3. The impact of spiders crawling on the Web can have a disastrous impact on programs running to collect WUM statistics and patterns. |
---|---|
ISSN: | 2161-5322 2161-5330 |
DOI: | 10.1109/AICCSA.2005.1387045 |