End-to-end pornographic website detection method based on HTML (Hypertext Markup Language) structural features
The invention relates to the technical field of detection and search, and particularly discloses an end-to-end pornographic website detection method based on HTML (Hypertext Markup Language) structural features, which comprises a word embedding layer, a Bi-LSTM (Bidirectional Long Short Term Memory)...
Gespeichert in:
Hauptverfasser: | , , , , , , |
---|---|
Format: | Patent |
Sprache: | chi ; eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The invention relates to the technical field of detection and search, and particularly discloses an end-to-end pornographic website detection method based on HTML (Hypertext Markup Language) structural features, which comprises a word embedding layer, a Bi-LSTM (Bidirectional Long Short Term Memory) layer, a convolution layer and an Attention layer, studies a website ranking mechanism of a search engine and tag structural features of HTML, extracts a meta tag in an HTML source code as a text data set, and finally obtains an end-to-end pornographic website detection result. A BiLSTM (BiLSTM) + TextCNN (TextCNN) + Attention collaborative model is constructed and is used for pornographic website detection.
本发明涉及侦测搜索技术领域,具体公开了一种基于HTML结构特征的端到端色情网站侦测方法,包括词嵌入层、Bi-LSTM层、卷积层、Attention层,研究了搜索引擎的网站排名机制和HTML的标签结构特征,通过提取HTML源代码中的meta标签作为文本数据集,构建了BiLSTM+TextCNN+Attention协同模型用于色情网站侦测。 |
---|