FAAG: Fast Adversarial Audio Generation through Interactive Attack Optimisation
Automatic Speech Recognition services (ASRs) inherit deep neural networks' vulnerabilities like crafted adversarial examples. Existing methods often suffer from low efficiency because the target phases are added to the entire audio sample, resulting in high demand for computational resources. T...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Automatic Speech Recognition services (ASRs) inherit deep neural networks'
vulnerabilities like crafted adversarial examples. Existing methods often
suffer from low efficiency because the target phases are added to the entire
audio sample, resulting in high demand for computational resources. This paper
proposes a novel scheme named FAAG as an iterative optimization-based method to
generate targeted adversarial examples quickly. By injecting the noise over the
beginning part of the audio, FAAG generates adversarial audio in high quality
with a high success rate timely. Specifically, we use audio's logits output to
map each character in the transcription to an approximate position of the
audio's frame. Thus, an adversarial example can be generated by FAAG in
approximately two minutes using CPUs only and around ten seconds with one GPU
while maintaining an average success rate over 85%. Specifically, the FAAG
method can speed up around 60% compared with the baseline method during the
adversarial example generation process. Furthermore, we found that appending
benign audio to any suspicious examples can effectively defend against the
targeted adversarial attack. We hope that this work paves the way for inventing
new adversarial attacks against speech recognition with computational
constraints. |
---|---|
DOI: | 10.48550/arxiv.2202.05416 |