Pt}: Anti-Associative Prompt Tuning for Open Set Visual Recognition

Multi-modality pre-trained models (PTMs) have considerably boosted the performance on a broad range of computer vision topics. Still, they have not been explored purposefully in open set recognition (OSR) scenarios when applying PTMs to downstream recognition tasks. Directly fine/prompt tuning PTMs...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on multimedia 2024, Vol.26, p.8419-8431
Hauptverfasser:	Ren, Hairui, Tang, Fan, Pan, Xingjia, Cao, Juan, Dong, Weiming, Lin, Zhiwen, Yan, Ke, Xu, Changsheng
Format:	Artikel
Sprache:	eng
Schlagworte:	anti-associative prompt tuning (<named-content xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:ali="http://www.niso.org/schemas/ali/1.0/" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" content-type="math" xlink:type="simple"> <inline-formula> <tex-math notation="LaTeX"> A^{2} Pt</tex-math> </inline-formula> </named-content>) Calibration class-aware representation Image recognition Multi-modality Pre-trained models (PTMs) Neck open set recognition (OSR) Task analysis Training Tuning Visualization
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Multi-modality pre-trained models (PTMs) have considerably boosted the performance on a broad range of computer vision topics. Still, they have not been explored purposefully in open set recognition (OSR) scenarios when applying PTMs to downstream recognition tasks. Directly fine/prompt tuning PTMs on closed-set classification tasks will inevitably suffer from data bias and always learn more or less target class-irrelevant co-occurring contextual information, which leads to over-confident predictions on unknown samples. In this paper, we propose a simple yet effective approach, termed Anti-Associative Prompt Tuning ({A^{2}Pt}), toward learning compact and accurate class-related representation with few class-irrelevant associations from context using multi-modal priors. Specifically, a cross-modal guided activation module is adopted to refine the class-aware representation and suppress the associations from co-occurring contexts by involving text-modal information. We further design an anti-association calibration module to obtain compact class-aware and class-irrelevant representations, respectively, by introducing two additional object functions. Extensive experiments on publicly available benchmarks, including CIFAR series, TinyImageNet, and ImageNet-21K-P, show that the proposed {A^{2}Pt} achieves substantial and consistent performance gains compared with both SOTA OSR and PTM prompt tuning approaches.
ISSN:	1520-9210 1941-0077
DOI:	10.1109/TMM.2023.3339387