An integrated system of mining HTML texts and filtering structured documents

This paper presents a method of mining HTML documents into structured documents and of filtering structured documents by using both slot weighting and token weighting. The goal of a mining algorithm is to find slot-token patterns in HTML documents. In order to express user interests in structured do...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Yun, Bo-Hyun, Lim, Myung-Eun, Park, Soo-Hyun
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:This paper presents a method of mining HTML documents into structured documents and of filtering structured documents by using both slot weighting and token weighting. The goal of a mining algorithm is to find slot-token patterns in HTML documents. In order to express user interests in structured document filtering, slot and token are considered. Our preference computation algorithm applies vector similarity and Bayesian probability to filter structured documents. The experimental results show that it is important to consider hyperlinking and unlablelling in mining HTML texts; slot and token weighting can enhance the performance of structured document filtering.
ISSN:0302-9743
1611-3349
DOI:10.5555/1760894.1760940