Dual-Modal Prompting for Sketch-Based Image Retrieval
Sketch-based image retrieval (SBIR) associates hand-drawn sketches with their corresponding realistic images. In this study, we aim to tackle two major challenges of this task simultaneously: i) zero-shot, dealing with unseen categories, and ii) fine-grained, referring to intra-category instance-lev...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Sketch-based image retrieval (SBIR) associates hand-drawn sketches with their
corresponding realistic images. In this study, we aim to tackle two major
challenges of this task simultaneously: i) zero-shot, dealing with unseen
categories, and ii) fine-grained, referring to intra-category instance-level
retrieval. Our key innovation lies in the realization that solely addressing
this cross-category and fine-grained recognition task from the generalization
perspective may be inadequate since the knowledge accumulated from limited seen
categories might not be fully valuable or transferable to unseen target
categories. Inspired by this, in this work, we propose a dual-modal prompting
CLIP (DP-CLIP) network, in which an adaptive prompting strategy is designed.
Specifically, to facilitate the adaptation of our DP-CLIP toward unpredictable
target categories, we employ a set of images within the target category and the
textual category label to respectively construct a set of category-adaptive
prompt tokens and channel scales. By integrating the generated guidance,
DP-CLIP could gain valuable category-centric insights, efficiently adapting to
novel categories and capturing unique discriminative clues for effective
retrieval within each target category. With these designs, our DP-CLIP
outperforms the state-of-the-art fine-grained zero-shot SBIR method by 7.3% in
Acc.@1 on the Sketchy dataset. Meanwhile, in the other two category-level
zero-shot SBIR benchmarks, our method also achieves promising performance. |
---|---|
DOI: | 10.48550/arxiv.2404.18695 |