DRAW: Dual-Decoder-Based Robust Audio Watermarking Against Desynchronization and Replay Attacks

Digital watermarking is a widely adopted authentication technique and one of its primary concerns in practical usage is robustness. However, existing audio watermarking methods face challenges in countering desynchronization attacks and replay attacks, which can easily lead to watermark extraction f...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on information forensics and security 2024, Vol.19, p.6529-6544
Hauptverfasser:	Li, Bin, Chen, Jincheng, Xu, Yuxiong, Li, Weixiang, Liu, Zhenghui
Format:	Artikel
Sprache:	eng
Schlagworte:	Background noise Coders Codes Decoders Decoding deep learning desynchronization attack Digital watermarking Distortion Feature extraction Learning Payloads replay attack Robust audio watermarking Robustness Synchronism Synchronization Watermarking
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Digital watermarking is a widely adopted authentication technique and one of its primary concerns in practical usage is robustness. However, existing audio watermarking methods face challenges in countering desynchronization attacks and replay attacks, which can easily lead to watermark extraction failure. In this paper, we introduce a learning-based scheme, named DRAW (Dual-decoder-based Robust Audio Watermarking), to overcome the robustness issue. Specifically, a watermark encoder embeds payloads together with synch codes into audio frames with high imperceptibility. For reliable watermark extraction, two separate decoders are designed, one for Fixed Length Synchronization Decoding (FLSD) and the other for Variable Length Payload Decoding (VLPD). The dual decoders are trained with the encoder with different weights in the loss function by considering their different roles for watermark extraction. To better resist attacks, a distortion layer is incorporated in-between the encoder and the decoders to simulate distortion and facilitate end-to-end learning. For the more challenging replay attacks, a pre-trained Replay Attack Simulation Network (RASN) is applied to simplify the simulation of re-recording with background noise and reverberation. Extensive experimental results show that the proposed method can be applied to variable-length audio clips with better auditory quality, and it outperforms state-of-the-art methods in robustness against various kinds of attacks.
ISSN:	1556-6013 1556-6021
DOI:	10.1109/TIFS.2024.3416047