Jointly Image Annotation and Classification Based on Supervised Multi-Modal Hierarchical Semantic Model
A lot of applications involve capturing correlations from multi-modality data, where available information spans multiple modalities, such as text, images or speech. In this paper, we pay attention to the specific case in which images are both labeled with a category and annotated with free text, an...
Gespeichert in:
Veröffentlicht in: | Pattern recognition and image analysis 2020, Vol.30 (1), p.76-86 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | A lot of applications involve capturing correlations from multi-modality data, where available information spans multiple modalities, such as text, images or speech. In this paper, we pay attention to the specific case in which images are both labeled with a category and annotated with free text, and develop a supervised multi-modal hierarchical semantic model (smHSM), where we incorporate image classification into the joint modeling of visual and textual information, for the tasks of image annotation and classification. To evaluate the effectiveness of our model, we experiment our model on two datasets, and compare with other traditional models. The results demonstrate the effectiveness and advantages of our model in caption perplexity, classification accuracy and image annotation accuracy. |
---|---|
ISSN: | 1054-6618 1555-6212 |
DOI: | 10.1134/S1054661820010058 |