End-to-end pornographic website detection method based on HTML (Hypertext Markup Language) structural features

The invention relates to the technical field of detection and search, and particularly discloses an end-to-end pornographic website detection method based on HTML (Hypertext Markup Language) structural features, which comprises a word embedding layer, a Bi-LSTM (Bidirectional Long Short Term Memory)...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: GUO CHENGYU, XIN YONGHUI, PAN JIN, ZHANG CUI, LIU YANG, CHEN MUQIAN, ZHAO CHUNLU
Format: Patent
Sprache:chi ; eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The invention relates to the technical field of detection and search, and particularly discloses an end-to-end pornographic website detection method based on HTML (Hypertext Markup Language) structural features, which comprises a word embedding layer, a Bi-LSTM (Bidirectional Long Short Term Memory) layer, a convolution layer and an Attention layer, studies a website ranking mechanism of a search engine and tag structural features of HTML, extracts a meta tag in an HTML source code as a text data set, and finally obtains an end-to-end pornographic website detection result. A BiLSTM (BiLSTM) + TextCNN (TextCNN) + Attention collaborative model is constructed and is used for pornographic website detection. 本发明涉及侦测搜索技术领域,具体公开了一种基于HTML结构特征的端到端色情网站侦测方法,包括词嵌入层、Bi-LSTM层、卷积层、Attention层,研究了搜索引擎的网站排名机制和HTML的标签结构特征,通过提取HTML源代码中的meta标签作为文本数据集,构建了BiLSTM+TextCNN+Attention协同模型用于色情网站侦测。