One-hot encoder using lazy evaluation of relational statements

A method and one or more non-transitory storage media are provided to train and implement a one-hot encoder. During a training phase, computation of an encoder state is performed by executing a set of relational statements to extract unique categories in a first training data set, associate each uni...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Schmidt, Felix, Vasic, Milos, Nikolic, Marija, Casserini, Matteo
Format: Patent
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:A method and one or more non-transitory storage media are provided to train and implement a one-hot encoder. During a training phase, computation of an encoder state is performed by executing a set of relational statements to extract unique categories in a first training data set, associate each unique category with a unique index, and generate a one-hot encoding for each unique category. The set of relational statements are executed by a query optimization engine. Execution of the set of relational statements is postponed until a result of each relational statement is needed, and the query optimization engine implements one or more optimizations when executing the set of relational statements. During an encoding phase, a set of categorical features in a second training data set are encoded based on the encoder state to form a set of encoded categorical features.