Dynamic differential privacy-based dataset condensation

With the continuous expansion of data scale, data condensation technology has emerged as a means to reduce costs related to storage, time, and energy consumption. Data condensation can generate a synthesized dataset of reduced size, enabling the training of models that exhibit high performance compa...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Neurocomputing (Amsterdam) 2024-12, Vol.608, p.128394, Article 128394
Hauptverfasser: Wu, Zhaoxuan, Gao, Xiaojing, Qian, Yongfeng, Hao, Yixue, Chen, Min
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:With the continuous expansion of data scale, data condensation technology has emerged as a means to reduce costs related to storage, time, and energy consumption. Data condensation can generate a synthesized dataset of reduced size, enabling the training of models that exhibit high performance comparable to the original dataset. Nevertheless, data condensation has also exposed privacy issues. Although many approaches have been proposed to preserve privacy for data condensation, the privacy protection for data condensation has not been well explored. Furthermore, to the best of our knowledge, none of the existing approaches propose dynamic parameters-based differential privacy dataset condensation considering unnecessary noise introduced by the fixed privacy parameter strategy. Most approaches typically inject constant noise with the fixed variance into gradients across all layers using predefined privacy parameters, which can significantly impact model accuracy. In this paper, we investigate alternative approaches for data condensation with differential privacy (DP) that aim to ensure DP while minimizing the noise added to gradients and improving the model accuracy. First, we develop a dynamic threshold method to reduce the noise added to gradients in the later stages of training by using a clipping threshold that decreases with training rounds. Second, noise injection in our method is not arbitrary as in conventional approaches; instead, it is based on the maximum size of the gradient after clipping. This approach ensures that only minimal noise increments are introduced, thereby mitigating accuracy loss and parameter instability that may arise from excessive noise injection. Finally, our privacy analysis confirms that the proposed method provides a rigorous privacy guarantee. Extensive evaluations on different datasets demonstrate that our approach can improve accuracy compared to existing DP data condensation techniques while adhering to the same privacy budget and applying a specified clipping threshold.
ISSN:0925-2312
DOI:10.1016/j.neucom.2024.128394