TASA: Deceiving Question Answering Models by Twin Answer Sentences Attack
We present Twin Answer Sentences Attack (TASA), an adversarial attack method for question answering (QA) models that produces fluent and grammatical adversarial contexts while maintaining gold answers. Despite phenomenal progress on general adversarial attacks, few works have investigated the vulner...
Gespeichert in:
Hauptverfasser: | , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We present Twin Answer Sentences Attack (TASA), an adversarial attack method
for question answering (QA) models that produces fluent and grammatical
adversarial contexts while maintaining gold answers. Despite phenomenal
progress on general adversarial attacks, few works have investigated the
vulnerability and attack specifically for QA models. In this work, we first
explore the biases in the existing models and discover that they mainly rely on
keyword matching between the question and context, and ignore the relevant
contextual relations for answer prediction. Based on two biases above, TASA
attacks the target model in two folds: (1) lowering the model's confidence on
the gold answer with a perturbed answer sentence; (2) misguiding the model
towards a wrong answer with a distracting answer sentence. Equipped with
designed beam search and filtering methods, TASA can generate more effective
attacks than existing textual attack methods while sustaining the quality of
contexts, in extensive experiments on five QA datasets and human evaluations. |
---|---|
DOI: | 10.48550/arxiv.2210.15221 |