A Cross-modal Alignment for Zero-shot Image Classification
Different from major classification methods based on large amounts of annotation data, we introduce a cross-modal alignment for zero-shot image classification.The key is utilizing the text attribute query learned from the seen classes to guide local feature responses in unseen classes. First, an enc...
Gespeichert in:
Veröffentlicht in: | IEEE access 2023-01, Vol.11, p.1-1 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Different from major classification methods based on large amounts of annotation data, we introduce a cross-modal alignment for zero-shot image classification.The key is utilizing the text attribute query learned from the seen classes to guide local feature responses in unseen classes. First, an encoder module is used to align semantic matching between visual features and their corresponding text attribute parts. Then, an attention module is used to get response maps through the text attribute query integrated into feature maps. Finally, the cosine distance metric is used to measure the matching degree of the text attribute query and the corresponding feature response. The experiment results show that our method get better performance than existing ZSL in Embedding-based methods as well as other generative methods. |
---|---|
ISSN: | 2169-3536 2169-3536 |
DOI: | 10.1109/ACCESS.2023.3237966 |