Reproducing HotFlip for Corpus Poisoning Attacks in Dense Retrieval
HotFlip is a topical gradient-based word substitution method for attacking language models. Recently, this method has been further applied to attack retrieval systems by generating malicious passages that are injected into a corpus, i.e., corpus poisoning. However, HotFlip is known to be computation...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | HotFlip is a topical gradient-based word substitution method for attacking
language models. Recently, this method has been further applied to attack
retrieval systems by generating malicious passages that are injected into a
corpus, i.e., corpus poisoning. However, HotFlip is known to be computationally
inefficient, with the majority of time being spent on gradient accumulation for
each query-passage pair during the adversarial token generation phase, making
it impossible to generate an adequate number of adversarial passages in a
reasonable amount of time. Moreover, the attack method itself assumes access to
a set of user queries, a strong assumption that does not correspond to how
real-world adversarial attacks are usually performed. In this paper, we first
significantly boost the efficiency of HotFlip, reducing the adversarial
generation process from 4 hours per document to only 15 minutes, using the same
hardware. We further contribute experiments and analysis on two additional
tasks: (1) transfer-based black-box attacks, and (2) query-agnostic attacks.
Whenever possible, we provide comparisons between the original method and our
improved version. Our experiments demonstrate that HotFlip can effectively
attack a variety of dense retrievers, with an observed trend that its attack
performance diminishes against more advanced and recent methods. Interestingly,
we observe that while HotFlip performs poorly in a black-box setting,
indicating limited capacity for generalization, in query-agnostic scenarios its
performance is correlated to the volume of injected adversarial passages. |
---|---|
DOI: | 10.48550/arxiv.2501.04802 |