Semantics-Based Extraction of Webpage Main Text
Extraction of web page main text is one of the most efficient methods to improve search engine. In the traditional method, the extraction of the web page main text use the similarity of DOM sub-tree as a end condition for the DOM tree traversing, while its speed is unsatisfactory on such a complex w...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Tagungsbericht |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Extraction of web page main text is one of the most efficient methods to improve search engine. In the traditional method, the extraction of the web page main text use the similarity of DOM sub-tree as a end condition for the DOM tree traversing, while its speed is unsatisfactory on such a complex web page structure. Thus, to raise the traverse speed and accuracy of DOM sub-tree effectively, we propose a method which is Semantics-based Extraction of Web page Main text. |
---|---|
DOI: | 10.1109/SKG.2012.47 |