Meta-learning For Vision-and-language Cross-lingual Transfer
Current pre-trained vison-language models (PVLMs) achieve excellent performance on a range of multi-modal datasets. Recent work has aimed at building multilingual models, and a range of novel multilingual multi-modal datasets have been proposed. Current PVLMs typically perform poorly on these datase...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Current pre-trained vison-language models (PVLMs) achieve excellent
performance on a range of multi-modal datasets. Recent work has aimed at
building multilingual models, and a range of novel multilingual multi-modal
datasets have been proposed. Current PVLMs typically perform poorly on these
datasets when used for multi-modal zero-shot or few-shot cross-lingual
transfer, especially for low-resource languages. To alleviate this problem, we
propose a novel meta-learning fine-tuning framework. Our framework makes
current PVLMs rapidly adaptive to new languages in vision-language scenarios by
designing MAML in a cross-lingual multi-modal manner. Experiments show that our
method boosts the performance of current state-of-the-art PVLMs in both
zero-shot and few-shot cross-lingual transfer on a range of vision-language
understanding tasks and datasets (XVNLI, xGQA, MaRVL, xFlicker&Co) |
---|---|
DOI: | 10.48550/arxiv.2305.14843 |