Power grid domain phrase identification and classification method and system based on Baidu encyclopedia
The invention discloses a power grid domain phrase identification and classification method and system based on Baidu encyclopedia. The method comprises the steps of extracting phrases of which the occurrence frequency is greater than or equal to a threshold t from a given corpus C as high-frequency...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , , , , , |
---|---|
Format: | Patent |
Sprache: | chi ; eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The invention discloses a power grid domain phrase identification and classification method and system based on Baidu encyclopedia. The method comprises the steps of extracting phrases of which the occurrence frequency is greater than or equal to a threshold t from a given corpus C as high-frequency candidate phrases; carrying out redundant phrase filtering on the extracted high-frequency candidate phrases; crawling entry explanations corresponding to the remaining high-frequency candidate phrases after phrase filtering from Baidu encyclopedia on the Internet; regarding the high-frequency candidate phrases which cannot be crawled to the entry explanation as illegal phrases to be removed, and regarding the high-frequency candidate phrases which can be crawled to the entry explanation as legal phrases to be reserved; and recognizing and classifying the high-frequency candidate phrases which are regarded as legal phrases through a pre-trained power grid domain phrase recognition and classification model, and out |
---|