Webpage type judgment method and device

The invention discloses a webpage type judgment method and device. The method comprises the following steps that: obtaining the html source code of a target webpage; through the html source code, constructing a node tree, wherein the node tree is constructed by various types of nodes in the html sou...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
1. Verfasser: XIE XINGBO
Format: Patent
Sprache:chi ; eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The invention discloses a webpage type judgment method and device. The method comprises the following steps that: obtaining the html source code of a target webpage; through the html source code, constructing a node tree, wherein the node tree is constructed by various types of nodes in the html source code; extracting webpage characteristics from the node tree to obtain a webpage characteristic set; and utilizing each webpage characteristic in the webpage characteristic set to judge the webpage type of a target webpage. Through the method, the problem of low accuracy of page type identification in the relevant art is solved. 本申请公开了种网页类型的判断方法及装置。该方法包括:获取目标网页的html源码;通过html源码构建节点树,其中,节点树由html源码中多种类型的节点构建;从节点树中抽取网页特征,得到网页特征集合;以及利用网页特征集合中各个网页特征对目标网页的网页类型进行判断。通过本申请,解决了相关技术中识别页面类型的准确性较低的问题。