Granularity-aware Adaptation for Image Retrieval over Multiple Tasks
Strong image search models can be learned for a specific domain, ie. set of labels, provided that some labeled images of that domain are available. A practical visual search model, however, should be versatile enough to solve multiple retrieval tasks simultaneously, even if those cover very differen...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Strong image search models can be learned for a specific domain, ie. set of
labels, provided that some labeled images of that domain are available. A
practical visual search model, however, should be versatile enough to solve
multiple retrieval tasks simultaneously, even if those cover very different
specialized domains. Additionally, it should be able to benefit from even
unlabeled images from these various retrieval tasks. This is the more practical
scenario that we consider in this paper. We address it with the proposed
Grappa, an approach that starts from a strong pretrained model, and adapts it
to tackle multiple retrieval tasks concurrently, using only unlabeled images
from the different task domains. We extend the pretrained model with multiple
independently trained sets of adaptors that use pseudo-label sets of different
sizes, effectively mimicking different pseudo-granularities. We reconcile all
adaptor sets into a single unified model suited for all retrieval tasks by
learning fusion layers that we guide by propagating pseudo-granularity
attentions across neighbors in the feature space. Results on a benchmark
composed of six heterogeneous retrieval tasks show that the unsupervised Grappa
model improves the zero-shot performance of a state-of-the-art self-supervised
learning model, and in some places reaches or improves over a task label-aware
oracle that selects the most fitting pseudo-granularity per task. |
---|---|
DOI: | 10.48550/arxiv.2210.02254 |