Rethink, Revisit, Revise: A Spiral Reinforced Self-Revised Network for Zero-Shot Learning

Current approaches to zero-shot learning (ZSL) struggle to learn generalizable semantic knowledge capable of capturing complex correlations. Inspired by Spiral Curriculum, which enhances learning processes by revisiting knowledge, we propose a form of spiral learning that revisits visual representat...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transaction on neural networks and learning systems 2024-01, Vol.35 (1), p.657-669
Hauptverfasser: Liu, Zhe, Li, Yun, Yao, Lina, McAuley, Julian, Dixon, Sam
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Current approaches to zero-shot learning (ZSL) struggle to learn generalizable semantic knowledge capable of capturing complex correlations. Inspired by Spiral Curriculum, which enhances learning processes by revisiting knowledge, we propose a form of spiral learning that revisits visual representations based on a sequence of attribute groups (e.g., a combined group of color and shape). Spiral learning aims to learn generalized local correlations, enabling models to gradually enhance global learning and, thus, understand complex correlations. Our implementation is based on a two-stage reinforced self-revised (RSR) framework: preview and review. RSR first previews visual information to construct diverse attribute groups in a weakly supervised manner. Then, it spirally learns refined localities based on attribute groups and uses localities to revise global semantic correlations. Our framework outperforms state-of-the-art algorithms on four benchmark datasets in both zero-shot and generalized zero-shot settings, which demonstrates the effectiveness of spiral learning in learning generalizable and complex correlations. We also conduct extensive analysis to show that attribute groups and reinforced decision processes can capture complementary semantic information to improve predictions and aid explainability.
ISSN:2162-237X
2162-2388
DOI:10.1109/TNNLS.2022.3176282