SYSTEM AND METHOD FOR GENERATING A SYNTHETIC DATASET FROM AN ORIGINAL DATASET

A method for generating a synthetic dataset from an original dataset includes encoding categorical features of the original dataset, embedding the encoded dataset in a low-dimensional space, selecting a seed record from the embedded dataset, identifying a plurality of nearest neighbor records to the...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: APTEKAR, Jacob, MEZEY, Jason, BEIGI, Mandis, SHAFQUAT, Afrah
Format: Patent
Sprache:eng ; fre ; ger
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:A method for generating a synthetic dataset from an original dataset includes encoding categorical features of the original dataset, embedding the encoded dataset in a low-dimensional space, selecting a seed record from the embedded dataset, identifying a plurality of nearest neighbor records to the seed record, generating a new record by randomly selecting features from the plurality of nearest neighbor records, and concatenating the new record into the synthetic dataset. For a synthetic dataset that contains N records, which may be the same as or different from the number of records in the original dataset, the selecting, identifying, generating, and concatenating operations operate a total of N times on the records in the embedded dataset.