IRFL: Image Recognition of Figurative Language
Figures of speech such as metaphors, similes, and idioms are integral parts of human communication. They are ubiquitous in many forms of discourse, allowing people to convey complex, abstract ideas and evoke emotion. As figurative forms are often conveyed through multiple modalities (e.g., both text...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Figures of speech such as metaphors, similes, and idioms are integral parts
of human communication. They are ubiquitous in many forms of discourse,
allowing people to convey complex, abstract ideas and evoke emotion. As
figurative forms are often conveyed through multiple modalities (e.g., both
text and images), understanding multimodal figurative language is an important
AI challenge, weaving together profound vision, language, commonsense and
cultural knowledge. In this work, we develop the Image Recognition of
Figurative Language (IRFL) dataset. We leverage human annotation and an
automatic pipeline we created to generate a multimodal dataset, and introduce
two novel tasks as a benchmark for multimodal figurative language
understanding. We experimented with state-of-the-art vision and language models
and found that the best (22%) performed substantially worse than humans (97%).
We release our dataset, benchmark, and code, in hopes of driving the
development of models that can better understand figurative language. |
---|---|
DOI: | 10.48550/arxiv.2303.15445 |