Pt}: Anti-Associative Prompt Tuning for Open Set Visual Recognition

Multi-modality pre-trained models (PTMs) have considerably boosted the performance on a broad range of computer vision topics. Still, they have not been explored purposefully in open set recognition (OSR) scenarios when applying PTMs to downstream recognition tasks. Directly fine/prompt tuning PTMs...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on multimedia 2024, Vol.26, p.8419-8431
Hauptverfasser: Ren, Hairui, Tang, Fan, Pan, Xingjia, Cao, Juan, Dong, Weiming, Lin, Zhiwen, Yan, Ke, Xu, Changsheng
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Multi-modality pre-trained models (PTMs) have considerably boosted the performance on a broad range of computer vision topics. Still, they have not been explored purposefully in open set recognition (OSR) scenarios when applying PTMs to downstream recognition tasks. Directly fine/prompt tuning PTMs on closed-set classification tasks will inevitably suffer from data bias and always learn more or less target class-irrelevant co-occurring contextual information, which leads to over-confident predictions on unknown samples. In this paper, we propose a simple yet effective approach, termed Anti-Associative Prompt Tuning ({A^{2}Pt}), toward learning compact and accurate class-related representation with few class-irrelevant associations from context using multi-modal priors. Specifically, a cross-modal guided activation module is adopted to refine the class-aware representation and suppress the associations from co-occurring contexts by involving text-modal information. We further design an anti-association calibration module to obtain compact class-aware and class-irrelevant representations, respectively, by introducing two additional object functions. Extensive experiments on publicly available benchmarks, including CIFAR series, TinyImageNet, and ImageNet-21K-P, show that the proposed {A^{2}Pt} achieves substantial and consistent performance gains compared with both SOTA OSR and PTM prompt tuning approaches.
ISSN:1520-9210
1941-0077
DOI:10.1109/TMM.2023.3339387