AUTOMATED CLASSIFICATION OF DATASETS USING SEMANTIC TYPE INDENTIFICATION

A method for automatically classifying datasets is implemented on a computing system. A dataset is received by the computing system from a source wherein the dataset includes a plurality of data entries. The method includes the steps of: providing a plurality of predetermined semantic types; process...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Magoon, Scott Howard, Khan, Tufail Ahmed, Baruah, Partha Pratim, Hansoty, Jatin, Goswami, Pranjal
Format: Patent
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:A method for automatically classifying datasets is implemented on a computing system. A dataset is received by the computing system from a source wherein the dataset includes a plurality of data entries. The method includes the steps of: providing a plurality of predetermined semantic types; processing the data entries to identify each of the data entries as one of the semantic types, the processing including examining the data entries using two different models; generating a confidence score for each of the models based upon the examination of the data entries; generating a confidence label based upon a predetermined combination of the confidence scores; and generating a classification recommendation for the dataset based upon the identified semantic types and associating the confidence label with the dataset.