Learning to Transform Sensitive Data with Variable Distribution Preservation

Preserving distributions of data values of a data asset in a data anonymization operation is provided. Anonymizing data values is performed by transforming sensitive data in a set of columns over rows of the data asset while preserving distribution of the data values in the set of transformed column...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Raphael, Roger C, Desai, Rajesh M, Qiao, Mu, Natarajan, Arjun, Aggarwal, Aniya, KUNDU, ASHISH, Payne, Joshua F
Format: Patent
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Preserving distributions of data values of a data asset in a data anonymization operation is provided. Anonymizing data values is performed by transforming sensitive data in a set of columns over rows of the data asset while preserving distribution of the data values in the set of transformed columns to a defined degree using a set of autoencoders and loss function. The autoencoders are base trained from preexisting data in a data assets catalog and actively trained during data dissemination. Parametric coefficients of the loss function are configured and the threshold is generated using policies from an enforcement decision for the data asset and data consumer. The loss function value of a selected row is compared to the threshold. Transformed data values of the selected row are transcribed to an output row when the loss function value is greater than the threshold and disseminated to the data consumer.