Task-Related Saliency for Few-Shot Image Classification

A weakness of the existing metric-based few-shot classification method is that task-unrelated objects or backgrounds may mislead the model since the small number of samples in the support set is insufficient to reveal the task-related targets. An essential cue of human wisdom in the few-shot classif...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transaction on neural networks and learning systems 2024-08, Vol.35 (8), p.10751-10763
Hauptverfasser: Zhou, Zhenyu, Luo, Lei, Zhou, Sihang, Li, Wang, Yang, Xihong, Liu, Xinwang, Zhu, En
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:A weakness of the existing metric-based few-shot classification method is that task-unrelated objects or backgrounds may mislead the model since the small number of samples in the support set is insufficient to reveal the task-related targets. An essential cue of human wisdom in the few-shot classification task is that they can recognize the task-related targets by a glimpse of support images without being distracted by task-unrelated things. Thus, we propose to explicitly learn task-related saliency features and make use of them in the metric-based few-shot learning schema. We divide the tackling of the task into three phases, namely, the modeling, the analyzing, and the matching. In the modeling phase, we introduce a saliency sensitive module (SSM), which is an inexact supervision task jointly trained with a standard multiclass classification task. SSM not only enhances the fine-grained representation of feature embedding but also can locate the task-related saliency features. Meanwhile, we propose a self-training-based task-related saliency network (TRSN) which is a lightweight network to distill task-related salience produced by SSM. In the analyzing phase, we freeze TRSN and use it to handle novel tasks. TRSN extracts task-relevant features while suppressing the disturbing task-unrelated features. We, therefore, can discriminate samples accurately in the matching phase by strengthening the task-related features. We conduct extensive experiments on five-way 1-shot and 5-shot settings to evaluate the proposed method. Results show that our method achieves a consistent performance gain on benchmarks and achieves the state-of-the-art.
ISSN:2162-237X
2162-2388
2162-2388
DOI:10.1109/TNNLS.2023.3243903