Pairwise open-sourced dataSet protection based on adaptive blind watermarking

The cost of collecting and labeling open-sourced datasets which promote the development of deep learning is expensive. Thus, it is important to design an efficient open-sourced data set protection algorithm to detect the open-sourced data sets used in illegal ways. In this paper, a protection algori...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Applied intelligence (Dordrecht, Netherlands) Netherlands), 2023-07, Vol.53 (14), p.17391-17410
Hauptverfasser: Pang, Zilong, Wang, Mingxu, Cao, Lvchen, Chai, Xiuli, Gan, Zhihua
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The cost of collecting and labeling open-sourced datasets which promote the development of deep learning is expensive. Thus, it is important to design an efficient open-sourced data set protection algorithm to detect the open-sourced data sets used in illegal ways. In this paper, a protection algorithm based on adaptive robust blind-watermark is proposed suitable for multiple paired open-sourced datasets, and the evaluation criteria of the algorithm are defined. Specifically, in the embed stage, the highly concealed of the watermark is realized by combining UNet and double GAN to take into account the local and global features of the carrier and the watermark image. A preprocess network is used in the Embedding network to adapt different watermark size. In the extraction part, a modified feature sharing UNet with GAN is used to ensure robustness of the extraction network. Paired datasets are used for training to ensure accurate extraction of watermarks. After the target model is trained using the watermarked dataset, its inference output will contain watermark information. When it is believed that a suspicious model is illegally trained with the dataset, it can be verified by the watermark extracted from inference output of the suspicious model. We evaluate our method on three target models including nine datasets. The results show that our framework successfully verifies the dataset used illegally and without a noticeable impact on the target model task when training with the watermark dataset.
ISSN:0924-669X
1573-7497
DOI:10.1007/s10489-022-04416-0