Can Go AIs be adversarially robust?
Prior work found that superhuman Go AIs can be defeated by simple adversarial strategies, especially "cyclic" attacks. In this paper, we study whether adding natural countermeasures can achieve robustness in Go, a favorable domain for robustness since it benefits from incredible average-ca...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Prior work found that superhuman Go AIs can be defeated by simple adversarial
strategies, especially "cyclic" attacks. In this paper, we study whether adding
natural countermeasures can achieve robustness in Go, a favorable domain for
robustness since it benefits from incredible average-case capability and a
narrow, innately adversarial setting. We test three defenses: adversarial
training on hand-constructed positions, iterated adversarial training, and
changing the network architecture. We find that though some of these defenses
protect against previously discovered attacks, none withstand freshly trained
adversaries. Furthermore, most of the reliably effective attacks these
adversaries discover are different realizations of the same overall class of
cyclic attacks. Our results suggest that building robust AI systems is
challenging even with extremely superhuman systems in some of the most
tractable settings, and highlight two key gaps: efficient generalization in
defenses, and diversity in training. For interactive examples of attacks and a
link to our codebase, see https://goattack.far.ai. |
---|---|
DOI: | 10.48550/arxiv.2406.12843 |