A new method for mining of WWW access sequences
Analysis of access sequences is an important technique in the mining of WWW access logs. The well‐known apriori algorithm is a typical method. A problem of this method is that the obtained relation between sequences is not reflected in the output. This paper proposes a new method of sequence analysi...
Gespeichert in:
Veröffentlicht in: | Electronics & Communications in Japan. Part 2, Electronics Electronics, 2007-10, Vol.90 (10), p.127-138 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Analysis of access sequences is an important technique in the mining of WWW access logs. The well‐known apriori algorithm is a typical method. A problem of this method is that the obtained relation between sequences is not reflected in the output. This paper proposes a new method of sequence analysis using matrix clustering. This method considers a binary matrix in which the sequences correspond to the rows and ordered pairs of pages correspond to the columns. The similarities between sequences are extracted as clusters in the matrix. Based on these clusters, super‐sequences, which are generalizations of similar sequences, can be generated. The proposed method is applied to real data and the results are evaluated. It is verified that the features of entire sequences can be extracted. © 2007 Wiley Periodicals, Inc. Electron Comm Jpn Pt 2, 90(10): 127–138, 2007; Published online in Wiley InterScience (www.interscience.wiley.com). DOI 10.1002/ecjb.20394 |
---|---|
ISSN: | 8756-663X 1520-6432 0915-1893 |
DOI: | 10.1002/ecjb.20394 |