Synthesizing de-identified test data

Embodiments include a method for one or more processors to receive an organic data set and a domain knowledge base. The one or more processors identify private data entities that are present within the organic data set. The one or more processors determine statistical attributes of private data enti...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: BRAGIN SERGEY, LEYVACHE, KARINE, HOLOHAN, NICHOLAS
Format: Patent
Sprache:chi ; eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Embodiments include a method for one or more processors to receive an organic data set and a domain knowledge base. The one or more processors identify private data entities that are present within the organic data set. The one or more processors determine statistical attributes of private data entities identified within the organic data set. The one or more processors create a plurality of test data templates by removing private data entities from the organic dataset. The one or more processors select, from a domain knowledge base, synthetic data entities that respectively match data types of the removed private data entities and align with statistical attributes of the private data entities, and the one or more processors generate synthetic test data by inserting synthetic data entities of matching data types for the removed private data entities into the test data template, respectively. 实施例包括一种用于一个或多个处理器接收有机数据集和领域知识库的方法。该一个或多个处理器识别存在于有机数据集内的私有数据实体。该一个或多个处理器确定在有机数据集内识别的私有数据实体的统计属性。该一个或多个处理器通过从有机数据集中移除私有数据实