Characterizing Hierarchical Semantic-Aware Parts With Transformers for Generalized Zero-Shot Learning

This paper presents a novel Transformer architecture for zero-shot learning (ZSL), termed TransZSL, which can characterize hierarchical semantic-aware parts. It consists of an adaptive token refinement (ATR), a hierarchical token aggregation (HTA), and semantic-aware prototypes (SAP). Firstly, the V...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on circuits and systems for video technology 2024-11, Vol.34 (11), p.11493-11506
Hauptverfasser:	Zhao, Peng, Xi, Xiaoming, Wang, Qiangchang, Yin, Yilong
Format:	Artikel
Sprache:	eng
Schlagworte:	Circuits and systems hierarchical tokens Noise measurement prototype learning Prototypes Semantics token selection Transformer Transformers Visualization Zero-shot learning
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	This paper presents a novel Transformer architecture for zero-shot learning (ZSL), termed TransZSL, which can characterize hierarchical semantic-aware parts. It consists of an adaptive token refinement (ATR), a hierarchical token aggregation (HTA), and semantic-aware prototypes (SAP). Firstly, the ViT is used as the backbone that provides comprehensive local information without missing details. To address the different degrees of noise caused by large appearance variations, the ATR is proposed to highlight important tokens and suppress useless ones adaptively. However, due to the complex image structure, some important tokens may be incorrectly discarded. Therefore, a random perturbation is proposed to reactivate discarded tokens randomly, reducing the risk of missing discriminative information. Secondly, dataset descriptions contain both low- and high-level attributes. To this end, the HTA aggregates complementary hierarchical tokens from multiple ViT layers. Thirdly, semantically similar content may be distributed in different tokens. To overcome this issue, the SAP is proposed to group semantically identical tokens into one prototype, focusing on semantic-aware parts. Besides, diversity loss is used to encourage networks to learn diverse prototypes that discover diverse parts. Both qualitative and quantitative results on several challenging tasks demonstrate the usefulness and effectiveness of our proposed methods.
ISSN:	1051-8215 1558-2205
DOI:	10.1109/TCSVT.2024.3422491