A Cross-modal Alignment for Zero-shot Image Classification

Different from major classification methods based on large amounts of annotation data, we introduce a cross-modal alignment for zero-shot image classification.The key is utilizing the text attribute query learned from the seen classes to guide local feature responses in unseen classes. First, an enc...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE access 2023-01, Vol.11, p.1-1
Hauptverfasser: Wu, Lu, Wu, Chenyu, Guo, Han, Zhao, Zhihao
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Different from major classification methods based on large amounts of annotation data, we introduce a cross-modal alignment for zero-shot image classification.The key is utilizing the text attribute query learned from the seen classes to guide local feature responses in unseen classes. First, an encoder module is used to align semantic matching between visual features and their corresponding text attribute parts. Then, an attention module is used to get response maps through the text attribute query integrated into feature maps. Finally, the cosine distance metric is used to measure the matching degree of the text attribute query and the corresponding feature response. The experiment results show that our method get better performance than existing ZSL in Embedding-based methods as well as other generative methods.
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2023.3237966