VEG-MMKG: Multimodal knowledge graph construction for vegetables based on pre-trained model extraction

•Agricultural multimodal information management and knowledge support.•Pre-trained model extracts entities and relationships of image-text pairs.•Propose a method for constructing a vegetable multimodal knowledge graph. Knowledge graph technology is of great significance to modern agricultural infor...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Computers and electronics in agriculture 2024-11, Vol.226, p.109398, Article 109398
Hauptverfasser: Lv, Bowen, Wu, Huarui, Chen, Wenbai, Chen, Cheng, Miao, Yisheng, Zhao, Chunjiang
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:•Agricultural multimodal information management and knowledge support.•Pre-trained model extracts entities and relationships of image-text pairs.•Propose a method for constructing a vegetable multimodal knowledge graph. Knowledge graph technology is of great significance to modern agricultural information management and data-driven decision support. However, agricultural knowledge is rich in types, and agricultural knowledge graph databases built only based on text are not conducive to users’ intuitive perception and comprehensive understanding of knowledge. In view of this, this paper proposes a solution to extract knowledge and construct an agricultural multimodal knowledge graph using a pre-trained language model. This paper takes two plants, cabbage and corn, as research objects. First, a text-image collaborative representation learning method with a two-stream structure is adopted to combine the image modal information of vegetables with the text modal information, and the correlation and complementarity between the two types of information are used to achieve entity alignment. In addition, in order to solve the problem of high similarity of vegetable entities in small categories, a cross-modal fine-grained contrastive learning method is introduced, and the problem of insufficient semantic association between modalities is solved by contrastive learning of vocabulary and small areas of images. Finally, a visual multimodal knowledge graph user interface is constructed using the results of image and text matching. Experimental results show that the image and text matching efficiency of the fine-tuned pre-trained model on the vegetable dataset is 76.7%, and appropriate images can be matched for text entities. The constructed visual multimodal knowledge graph database allows users to query and filter knowledge according to their needs, providing assistance for subsequent research on various applications in specific fields such as multimodal agricultural intelligent question and answer, crop pest and disease identification, and agricultural product recommendations.
ISSN:0168-1699
DOI:10.1016/j.compag.2024.109398