On the Effectiveness of Images in Multi-modal Text Classification: An Annotation Study
Combining different input modalities beyond text is a key challenge for natural language processing. Previous work has been inconclusive as to the true utility of images as a supplementary information source for text classification tasks, motivating this large-scale human study of labelling performa...
Gespeichert in:
Veröffentlicht in: | ACM transactions on Asian and low-resource language information processing 2023-03, Vol.22 (3), p.1-19, Article 79 |
---|---|
Hauptverfasser: | , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Combining different input modalities beyond text is a key challenge for natural language processing. Previous work has been inconclusive as to the true utility of images as a supplementary information source for text classification tasks, motivating this large-scale human study of labelling performance given text-only, images-only, or both text and images. To this end, we create a new dataset accompanied with a novel annotation method—Japanese Entity Labeling with Dynamic Annotation—to deepen our understanding of the effectiveness of images for multi-modal text classification. By performing careful comparative analysis of human performance and the performance of state-of-the-art multi-modal text classification models, we gain valuable insights into differences between human and model performance, and the conditions under which images are beneficial for text classification. |
---|---|
ISSN: | 2375-4699 2375-4702 |
DOI: | 10.1145/3565572 |