Method for collecting data of word segmentation dictionary based on statistical machine learning method

The invention relates to the field of data processing foundations and specifically relates to a method for collecting data of a word segmentation dictionary based on a statistical machine learning method. The method comprises the steps that the machine learning method is applied; a classification id...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: SHI DUNSHI, CHENG JIELING, ZHANG DANING, JI JIANGTAO, ZHANG XIAOKUN, ZHOU JIANG, QIN YULIN, MIN XINLI, ZHANG YU, XUE JUNZHI, MA WEIHUA, ZHANG GUOJUN
Format: Patent
Sprache:chi ; eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The invention relates to the field of data processing foundations and specifically relates to a method for collecting data of a word segmentation dictionary based on a statistical machine learning method. The method comprises the steps that the machine learning method is applied; a classification idea is used to acquire a domain concept; a domain concept acquisition problem is deemed as a binary classification problem; the concept is acquired and processed; collected information or data is processed; an information database and an index database are established; data contents desired by a user are formed; a response is made to various types of retrieval proposed by the user; and information or relevant pointers required by the user can be provided. In this way, accuracy of information retrieval is increased. 本发明涉及数据处理基础领域,具体来说是种基于统计机器学习方法的分词字典数据采集方法,利用机器学习的方法,采用分类思想获取领域概念,把领域概念获取问题看成是个二值分类问题,进行概念的获取及处理,从而对采集信息或数据进行加工,建立信息数据库和索引数据库,形成用户想要的数据内容,对用户提出的各种检索做出响应,为提供用户所需的信息或相关指针,从而提高了信息检索的准确率和准确率。