Domain disentanglement and fusion based on hyperbolic neural networks for zero-shot sketch-based image retrieval

•We propose a ZS-SBIR model based on hyperbolic neural networks.•Hyperbolic space is more suitable for feature representation than Euclidean space.•The domain disentanglement network effectively aligning image and sketch features.•The domain fusion network enhances retrieval feature representation c...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Information processing & management 2025-01, Vol.62 (1), p.103963, Article 103963
Hauptverfasser:	Zhang, Qing, Zhang, Jing, Su, Xiangdong, Wang, Yonghe, Bao, Feilong, Gao, Guanglai
Format:	Artikel
Sprache:	eng
Schlagworte:	Domain disentangled Hyperbolic space Sketch-based image retrieval Zero-shot learning
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	•We propose a ZS-SBIR model based on hyperbolic neural networks.•Hyperbolic space is more suitable for feature representation than Euclidean space.•The domain disentanglement network effectively aligning image and sketch features.•The domain fusion network enhances retrieval feature representation capabilities.•The proposed model surpasses the state-of-the-art models. With the advancement of zero-shot sketch-based image retrieval (ZS-SBIR) tasks, existing methods still encounter two major challenges: Euclidean space fails to effectively represent data with hierarchical structures, leading to non-discriminative retrieval features; relying solely on visual information is insufficient to align cross-domain features and maximize their domain generalization capabilities. To tackle these issues, this paper designs a hyperbolic neural networks based ZS-SBIR framework that considers domain disentanglement and fusion learning, called “DDFUS”. Specifically, we present a contrastive cross-modal learning method that guides the alignment of multi-domain visual representations with semantic representations in the hyperbolic space. This approach ensures that each visual representation possesses rich semantic hierarchical structure information. Furthermore, we propose a domain disentanglement method based on hyperbolic neural networks that employs paired hyperbolic encoders to decompose the representation of each domain into domain-invariant and domain-specific features to reduce information disturbance between domains. Moreover, we design an advanced cross-domain fusion method that promotes the fusion and exchange of multi-domain information through the reconstruction and generation of cross-domain samples. It significantly enhances the representation and generalization capabilities of domain-invariant features. Comprehensive experiments demonstrate that the mAP@all of our DDFUS model surpasses CNN-based models by 18.99 % on the Sketchy dataset, 1.93 % on the more difficult TU-Berlin dataset, and 11.4 % on the more challenging QuickDraw dataset.
ISSN:	0306-4573
DOI:	10.1016/j.ipm.2024.103963