Feature alignment via mutual mapping for few-shot fine-grained visual classification

Few-shot fine-grained visual classification aims to identify fine-grained concepts with very few samples, which is widely used in many fields, such as the classification of different species of birds in biological research, and the identification of car models in traffic monitoring. Compared with th...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Image and vision computing 2024-07, Vol.147, p.105032, Article 105032
Hauptverfasser: Wu, Qin, Song, Tingting, Fan, Shengnan, Chen, Zeda, Jin, Kelei, Zhou, Haojie
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Few-shot fine-grained visual classification aims to identify fine-grained concepts with very few samples, which is widely used in many fields, such as the classification of different species of birds in biological research, and the identification of car models in traffic monitoring. Compared with the common few-shot classification task, it encounters difficulties due to significant variations within each class and small gaps between different categories. To address such problems, previous studies primarily project support samples into the space of query samples and employ metric learning to classify query images into their respective categories. However, we observe that such methods are not effective in resolving inter-class variations. To overcome this limitation, we propose a new feature alignment method based on mutual mapping, which simultaneously considers the discriminative features of new samples and classes. Specifically, besides projecting support samples into the space of query samples for reducing intra-class variations, we also project query samples into the space of support samples to increase inter-class variations. Furthermore, a direct position self-reconstruction module is proposed to utilize the location information of objects and obtain more discriminative features. Extensive experiments on four fine-grained benchmarks demonstrate that our approach is competitive when compared with other state-of-the-art methods, in both 1-shot and 5-shot settings. In the case of 5-shot, our method achieved the best performance on all four datasets, with 92.11%, 85.31%, 96.09%, and 94.64% accuracies on CUB-200-2011, Stanford Dogs, Stanford Cars, and Aircaft, respectively. •A direct position self-reconstruction module is proposed to enhance features.•A mutual mapping method is used to handle inter-class and intra-class variations.•Spatial and channel mutual mappings are applied to get better feature alignment.•Extensive benchmark experiments demonstrate the superiority of our method.
ISSN:0262-8856
1872-8138
DOI:10.1016/j.imavis.2024.105032