Bad corpus filtering method and system
The invention discloses a bad corpus filtering method and system, and the method comprises the following steps: obtaining a to-be-recognized text corpus, carrying out the preprocessing of the to-be-recognized text corpus, and obtaining a basic text corpus; entities in the basic text corpus are extra...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Patent |
Sprache: | chi ; eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The invention discloses a bad corpus filtering method and system, and the method comprises the following steps: obtaining a to-be-recognized text corpus, carrying out the preprocessing of the to-be-recognized text corpus, and obtaining a basic text corpus; entities in the basic text corpus are extracted, matching search is conducted on the entities of the basic text corpus according to the bad text knowledge graph, and a first recognition result is obtained; detecting and recognizing the basic text corpus according to a corpus recognition model to obtain a second recognition result; and filtering the to-be-recognized text corpus according to the first recognition result or/and the second recognition result, and updating the bad text knowledge graph according to the second recognition result. According to the method, bad texts are screened through a knowledge graph technology, and a plurality of candidate bad entities can be obtained by utilizing semantic network essence and strong association capability of th |
---|