DLI-Net: Dual Local Interaction Network for Fine-Grained Sketch-Based Image Retrieval

Fine-grained sketch-based image retrieval (FG-SBIR) is considered an ideal method of image retrieval due to the rich and easily accessible characteristics of sketches. It aims to find the most similar photo from the photo gallery based on the input sketch. Most previous works follow the paradigm tha...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on circuits and systems for video technology 2022-10, Vol.32 (10), p.7177-7189
Hauptverfasser: Sun, Haifeng, Xu, Jiaqing, Wang, Jingyu, Qi, Qi, Ge, Ce, Liao, Jianxin
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Fine-grained sketch-based image retrieval (FG-SBIR) is considered an ideal method of image retrieval due to the rich and easily accessible characteristics of sketches. It aims to find the most similar photo from the photo gallery based on the input sketch. Most previous works follow the paradigm that extracting global feature first and then projecting the features of sketch and photo to unified embedding feature space using triplet loss. However, the global feature is not appropriate for extracting the crucial fine-grained information. Based on this principle, we propose a Dual Local Interaction Network (DLI-Net). DLI-Net explores an effective and efficient way to utilize local features for FG-SBIR. Specifically, we first propose a Local Feature Extractor to extract mid-level local features. Then, in response to the problems brought by local features, we propose a Dual Interaction Module, which contains Self Interaction Module and Cross Interaction Module. Self Interaction Module speeds up retrieval by eliminating the redundant local features of background. Cross Interaction Module solves the spatial misalignment by making the sketches interact with photos. Extensive experiments on six commonly used datasets show that our DLI-Net outperforms state-of-the-art competitors by a significant margin with a reasonable retrieval speed. Moreover, to the best of our knowledge, DLI-Net is the first model that beats humans on all six datasets. Besides, DLI-Net also performs best on cross-category fine-grained sketch-based image retrieval task, which further demonstrates local features are more appropriate for FG-SBIR.
ISSN:1051-8215
1558-2205
DOI:10.1109/TCSVT.2022.3171972