Power grid domain phrase identification and classification method and system based on Baidu encyclopedia

The invention discloses a power grid domain phrase identification and classification method and system based on Baidu encyclopedia. The method comprises the steps of extracting phrases of which the occurrence frequency is greater than or equal to a threshold t from a given corpus C as high-frequency...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: CHEN YING, LIU XINGWEI, PI JUNBO, XU ZHENGQI, WU KUN, FAN SHIXIONG, LIAO ZHIFANG, LI BIN, LI ZEKE, LIN JINGHUAI, FAN HAIWEI, WANG JING, HAN YE, FENG CHANGYOU
Format: Patent
Sprache:chi ; eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The invention discloses a power grid domain phrase identification and classification method and system based on Baidu encyclopedia. The method comprises the steps of extracting phrases of which the occurrence frequency is greater than or equal to a threshold t from a given corpus C as high-frequency candidate phrases; carrying out redundant phrase filtering on the extracted high-frequency candidate phrases; crawling entry explanations corresponding to the remaining high-frequency candidate phrases after phrase filtering from Baidu encyclopedia on the Internet; regarding the high-frequency candidate phrases which cannot be crawled to the entry explanation as illegal phrases to be removed, and regarding the high-frequency candidate phrases which can be crawled to the entry explanation as legal phrases to be reserved; and recognizing and classifying the high-frequency candidate phrases which are regarded as legal phrases through a pre-trained power grid domain phrase recognition and classification model, and out