Bad corpus filtering method and system

The invention discloses a bad corpus filtering method and system, and the method comprises the following steps: obtaining a to-be-recognized text corpus, carrying out the preprocessing of the to-be-recognized text corpus, and obtaining a basic text corpus; entities in the basic text corpus are extra...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: CHENG KAILIN, ZHOU YUHAN, LIU KAI, JIANG XIAONING, XIE HONGMIN
Format: Patent
Sprache:chi ; eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The invention discloses a bad corpus filtering method and system, and the method comprises the following steps: obtaining a to-be-recognized text corpus, carrying out the preprocessing of the to-be-recognized text corpus, and obtaining a basic text corpus; entities in the basic text corpus are extracted, matching search is conducted on the entities of the basic text corpus according to the bad text knowledge graph, and a first recognition result is obtained; detecting and recognizing the basic text corpus according to a corpus recognition model to obtain a second recognition result; and filtering the to-be-recognized text corpus according to the first recognition result or/and the second recognition result, and updating the bad text knowledge graph according to the second recognition result. According to the method, bad texts are screened through a knowledge graph technology, and a plurality of candidate bad entities can be obtained by utilizing semantic network essence and strong association capability of th