We-media content risk control method and system based on multi-layer Trie and embedding

The invention belongs to the technical field of big data processing, and particularly relates to a self-media content risk control method and system based on multi-layer Trie and fusion embedding, and the method comprises the steps: constructing a sensitive lexicon according to different risk levels...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: CAO MENGJIA, ZHAO BIN, FAN SHUNGUO, YAO KAI
Format: Patent
Sprache:chi ; eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The invention belongs to the technical field of big data processing, and particularly relates to a self-media content risk control method and system based on multi-layer Trie and fusion embedding, and the method comprises the steps: constructing a sensitive lexicon according to different risk levels; performing vector conversion on words in the absolute sensitive word library, fusing the words as a final feature vector, and storing the final feature vector in a distributed vector library; establishing an inverted index for each risk word in the potential risk word bank; constructing a multi-layer sensitive word Trie tree based on the sensitive word library; and inputting a to-be-detected text into the multilayer sensitive word Trie tree to perform complete matching detection, fusing embedding similarity calculation, and performing combined detection to obtain an auditing result. According to the method, the risk levels are divided, missing detection of potential violation statements is effectively avoided, co