KBHN: A knowledge-aware bi-hypergraph network based on visual-knowledge features fusion for teaching image annotation

Teaching images, as an important auxiliary tool in teaching and learning, are fundamentally different from the general domain images. Besides visually similar images being more likely to share common labels, teaching images also face the challenge of visual-knowledge inconsistency, including intra-k...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Information processing & management 2023-01, Vol.60 (1), p.103106, Article 103106
Hauptverfasser:	Li, Hao, Wang, Jing, Du, Xu, Hu, Zhuang, Yang, Shuoqiu
Format:	Artikel
Sprache:	eng
Schlagworte:	Bi-hypergraph network Intelligent education Knowledge hypergraph Teaching image annotation Visual-knowledge features fusion Visual-knowledge inconsistency
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Teaching images, as an important auxiliary tool in teaching and learning, are fundamentally different from the general domain images. Besides visually similar images being more likely to share common labels, teaching images also face the challenge of visual-knowledge inconsistency, including intra-knowledge visual difference and inter-knowledge visual similarity. To address the above challenges, we present KBHN, a knowledge-aware bi-hypergraph network, which not only considers coarse-grained visual features, but also extracts fine-grained knowledge features that reflect knowledge intention hidden in teaching images. In detail, a visual hypergraph is constructed to connect images with visual similarity. It further enriches coarse-grained visual features by modeling the high-order visual relations among teaching images. Moreover, a knowledge hypergraph based on typical images is built to aggregate images with similar knowledge information, which innovatively extracts fine-grained knowledge features by modeling high-order knowledge correlations between local regions. Furthermore, a multi-head attention mechanism is adopted to fuse visual-knowledge features for enriching image representation. A teaching image dataset is constructed to train and validate our model, which contains 20744 real-world images annotated with 24 knowledge points. Experimental results demonstrate that KBHN, incorporating visual-knowledge features, achieves state-of-the-art performance compared to existing methods.
ISSN:	0306-4573 1873-5371
DOI:	10.1016/j.ipm.2022.103106