Towards More Faithful Natural Language Explanation Using Multi-Level Contrastive Learning in VQA
Natural language explanation in visual question answer (VQA-NLE) aims to explain the decision-making process of models by generating natural language sentences to increase users' trust in the black-box systems. Existing post-hoc methods have achieved significant progress in obtaining a plausibl...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Natural language explanation in visual question answer (VQA-NLE) aims to
explain the decision-making process of models by generating natural language
sentences to increase users' trust in the black-box systems. Existing post-hoc
methods have achieved significant progress in obtaining a plausible
explanation. However, such post-hoc explanations are not always aligned with
human logical inference, suffering from the issues on: 1) Deductive
unsatisfiability, the generated explanations do not logically lead to the
answer; 2) Factual inconsistency, the model falsifies its counterfactual
explanation for answers without considering the facts in images; and 3)
Semantic perturbation insensitivity, the model can not recognize the semantic
changes caused by small perturbations. These problems reduce the faithfulness
of explanations generated by models. To address the above issues, we propose a
novel self-supervised \textbf{M}ulti-level \textbf{C}ontrastive
\textbf{L}earning based natural language \textbf{E}xplanation model (MCLE) for
VQA with semantic-level, image-level, and instance-level factual and
counterfactual samples. MCLE extracts discriminative features and aligns the
feature spaces from explanations with visual question and answer to generate
more consistent explanations. We conduct extensive experiments, ablation
analysis, and case study to demonstrate the effectiveness of our method on two
VQA-NLE benchmarks. |
---|---|
DOI: | 10.48550/arxiv.2312.13594 |