Intelligent Information extraction algorithm of Agricultural text based on Machine Learning method
The Internet agricultural technology question and answer platform now only relies on manual to provide answer service, the response speed is slow, and the answer quality is difficult to be guaranteed. In order to realize the intelligent question and answer of agricultural technology and construct th...
Gespeichert in:
Veröffentlicht in: | Journal of physics. Conference series 2021-06, Vol.1952 (2), p.22073 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The Internet agricultural technology question and answer platform now only relies on manual to provide answer service, the response speed is slow, and the answer quality is difficult to be guaranteed. In order to realize the intelligent question and answer of agricultural technology and construct the knowledge base of agricultural technology, it is necessary to extract the named entity triple of “crop-pest-pesticide” from the existing question and answer data. There are few researches on agricultural Chinese named entity recognition, and the accuracy is low. According to the characteristics of named entities of crops, diseases and insect pests and pesticides, and according to the question and answer data of agricultural technology, a method of identifying named entities of crops, diseases and pests and pesticides based on conditional random field was proposed. The data set is formatted and segmented automatically, and the corpus after word segmentation is automatically tagged according to whether it contains a specific definition word, whether it contains a specific partial radical, whether it is a quantifier, whether it is a specific left and right definition word and part of speech. Using the tagged data to train the CRF model, we can classify the corpus, including judging whether the corpus belongs to crop, pest and pesticide named entities and identifying the position of the corpus in the compound named entity, thus realizing the recognition of the three kinds of named entities and automatically constructing the associated triple. Through the experiment to select the feature combination and adjust the context window size, the recognition accuracy of this method is improved, the model training time is reduced, and the accuracy of crop, pest and pesticide named entity recognition is 97.72%, 87.63% and 98.05%, respectively, which is significantly higher than the existing methods. |
---|---|
ISSN: | 1742-6588 1742-6596 |
DOI: | 10.1088/1742-6596/1952/2/022073 |