Spinning Sequence-to-Sequence Models with Meta-Backdoors

We investigate a new threat to neural sequence-to-sequence (seq2seq) models: training-time attacks that cause models to "spin" their output and support a certain sentiment when the input contains adversary-chosen trigger words. For example, a summarization model will output positive summar...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2022-10
Hauptverfasser:	Bagdasaryan, Eugene, Shmatikov, Vitaly
Format:	Artikel
Sprache:	eng
Schlagworte:	Context Data mining Model accuracy
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Schreiben Sie den ersten Kommentar!