Semantics-Based Extraction of Webpage Main Text

Extraction of web page main text is one of the most efficient methods to improve search engine. In the traditional method, the extraction of the web page main text use the similarity of DOM sub-tree as a end condition for the DOM tree traversing, while its speed is unsatisfactory on such a complex w...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Han Fengjiao, Zhou Zhurong
Format:	Tagungsbericht
Sprache:	eng
Schlagworte:	Accuracy Computers Data mining Educational institutions Extraction HTML Navigation Semantics Webpage
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Extraction of web page main text is one of the most efficient methods to improve search engine. In the traditional method, the extraction of the web page main text use the similarity of DOM sub-tree as a end condition for the DOM tree traversing, while its speed is unsatisfactory on such a complex web page structure. Thus, to raise the traverse speed and accuracy of DOM sub-tree effectively, we propose a method which is Semantics-based Extraction of Web page Main text.
DOI:	10.1109/SKG.2012.47