MemeCLIP: Leveraging CLIP Representations for Multimodal Meme Classification
The complexity of text-embedded images presents a formidable challenge in machine learning given the need for multimodal understanding of multiple aspects of expression conveyed by them. While previous research in multimodal analysis has primarily focused on singular aspects such as hate speech and...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The complexity of text-embedded images presents a formidable challenge in
machine learning given the need for multimodal understanding of multiple
aspects of expression conveyed by them. While previous research in multimodal
analysis has primarily focused on singular aspects such as hate speech and its
subclasses, this study expands this focus to encompass multiple aspects of
linguistics: hate, targets of hate, stance, and humor. We introduce a novel
dataset PrideMM comprising 5,063 text-embedded images associated with the
LGBTQ+ Pride movement, thereby addressing a serious gap in existing resources.
We conduct extensive experimentation on PrideMM by using unimodal and
multimodal baseline methods to establish benchmarks for each task.
Additionally, we propose a novel framework MemeCLIP for efficient downstream
learning while preserving the knowledge of the pre-trained CLIP model. The
results of our experiments show that MemeCLIP achieves superior performance
compared to previously proposed frameworks on two real-world datasets. We
further compare the performance of MemeCLIP and zero-shot GPT-4 on the hate
classification task. Finally, we discuss the shortcomings of our model by
qualitatively analyzing misclassified samples. Our code and dataset are
publicly available at: https://github.com/SiddhantBikram/MemeCLIP. |
---|---|
DOI: | 10.48550/arxiv.2409.14703 |