Test-Time Model Adaptation for Visual Question Answering With Debiased Self-Supervisions

Visual question answering (VQA) is a prevalent task in real-world, and plays an essential role in helping the blind understand the physical world. However, due to the real-world complexity, VQA test samples may come from a different distribution from the training data, resulting in unavoidable perfo...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on multimedia 2024, Vol.26, p.2137-2147
Hauptverfasser: Wen, Zhiquan, Niu, Shuaicheng, Li, Ge, Wu, Qingyao, Tan, Mingkui, Wu, Qi
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Visual question answering (VQA) is a prevalent task in real-world, and plays an essential role in helping the blind understand the physical world. However, due to the real-world complexity, VQA test samples may come from a different distribution from the training data, resulting in unavoidable performance degradation. This similar issue also exists in the image recognition field, in which one most recent effective solutions is a test-time adaptation (TTA). TTA adapts a trained model at test time using only test samples, which provides a new idea to alleviate the analogous issue in VQA. However, naively introducing existing TTA methods ( e.g., test-time entropy minimisation) into VQA is imperfect and achieves only marginal performance gain. The reason is that prior methods do not consider the special nature of the VQA problem and ignore that 1) the biased samples in the dataset may have negative effects on test-time model adaptation, and 2) the model may have captured the biases in the dataset. In this paper, we propose Test-time Debiased Self-supervised (TDS) learning objectives for VQA model adaptation. Specifically, we minimise the entropy for those unbiased test samples. To identify these samples, we construct a negative sample for each test sample, and regard the test samples as unbiased if the output answers are different when feeding the test sample and the counterpart negative sample into the VQA model. Meanwhile, we also remove those samples with high prediction entropy from adaptation, making the test-time gradients more reliable. To hinder the model from excessively fitting the superficial correlations of the biased sample, we adopt the biased samples and the counterpart negative samples to assist the adaptation. Extensive experiments on the VQA-CP v1 and VQA-CP v2 datasets demonstrate the effectiveness of our TDS.
ISSN:1520-9210
1941-0077
DOI:10.1109/TMM.2023.3292597