A new method for mining of WWW access sequences

Analysis of access sequences is an important technique in the mining of WWW access logs. The well‐known apriori algorithm is a typical method. A problem of this method is that the obtained relation between sequences is not reflected in the output. This paper proposes a new method of sequence analysi...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Electronics & Communications in Japan. Part 2, Electronics Electronics, 2007-10, Vol.90 (10), p.127-138
Hauptverfasser: Oyanagi, Shigeru, Kamiharako, Masatoshi, Kubota, Kazuto, Nakase, Akihiko
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Analysis of access sequences is an important technique in the mining of WWW access logs. The well‐known apriori algorithm is a typical method. A problem of this method is that the obtained relation between sequences is not reflected in the output. This paper proposes a new method of sequence analysis using matrix clustering. This method considers a binary matrix in which the sequences correspond to the rows and ordered pairs of pages correspond to the columns. The similarities between sequences are extracted as clusters in the matrix. Based on these clusters, super‐sequences, which are generalizations of similar sequences, can be generated. The proposed method is applied to real data and the results are evaluated. It is verified that the features of entire sequences can be extracted. © 2007 Wiley Periodicals, Inc. Electron Comm Jpn Pt 2, 90(10): 127–138, 2007; Published online in Wiley InterScience (www.interscience.wiley.com). DOI 10.1002/ecjb.20394
ISSN:8756-663X
1520-6432
0915-1893
DOI:10.1002/ecjb.20394