Diagnosing crop diseases based on domain-adaptive pre-training BERT of electronic medical records

Crop Electronic Medical Records (CEMRs) contain a rich diversity of information about disease characteristics, which is highly valuable as a support to plant doctors diagnosing disease. However, mining CEMR presents challenges, such as the lack of publicly available datasets, unlabeled data, and var...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Applied intelligence (Dordrecht, Netherlands) Netherlands), 2023-06, Vol.53 (12), p.15979-15992
Hauptverfasser: Ding, Junqi, Li, Bo, Xu, Chang, Qiao, Yan, Zhang, Lingxian
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Crop Electronic Medical Records (CEMRs) contain a rich diversity of information about disease characteristics, which is highly valuable as a support to plant doctors diagnosing disease. However, mining CEMR presents challenges, such as the lack of publicly available datasets, unlabeled data, and various agricultural and slang terms in the text, which are still unstudied. This study proposes a crop disease diagnosis model based on Bidirectional Encoder Representations from Transformers specific to the crop disease domain and RCNN (CdsBERT-RCNN). First, a crop disease corpus is constructed for domain-adaptive pre-training; second, semantic features of CEMRs are extracted by CdsBERT; third, distinct contextual information is further extracted, and disease diagnosis is achieved through RCNN. A CEMR dataset containing 32 diseases was constructed to validate the model. Experiments showed that the proposed method could effectively diagnose crop diseases with an F1-score of 85.63% and an accuracy of 85.65%. The proposed method outperformed widely used neural network models, i.e., CNN, DPCNN, RCNN, RNN, attention-based RNN, FastText, and Transformer, with more information obtained by self-supervised pre-training; and outperforms generic domain pre-trained language models, i.e., BERT, ERNIE, XLNet and RoBERTa, with data distribution more appropriate for the crop disease domain and effective fine-tuning strategies. Furthermore, we conduct ablation studies, demonstrating the value of DPAT and RCNN in our model. The results demonstrated the effectiveness of our framework for the CEMR-based disease diagnosis, with potential applications in electronic medical record systems and artificial intelligence in crop disease management.
ISSN:0924-669X
1573-7497
DOI:10.1007/s10489-022-04346-x