DATA CONTENT IDENTIFICATION

The subject matter disclosed herein provides methods for identifying the type of content found in a database or source file having data records. A source file having one or more data records may be accessed. The data records may be associated with one or more data values arranged into columns. One o...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: LORENZ BEN, BEUTLER SOPHIE
Format: Patent
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The subject matter disclosed herein provides methods for identifying the type of content found in a database or source file having data records. A source file having one or more data records may be accessed. The data records may be associated with one or more data values arranged into columns. One or more data types may be proposed for at least one column by examining the data values in the column. A confidence score may be calculated for each proposed data type. The proposed data types may be arranged into a prioritized list based on each data type's confidence score. One or more rules may be applied to the column to finalize priorities of the proposed data types. The rules may be applied without referring to the data values in the column. Results may be provided based on the finalized priorities. Related apparatus, systems, techniques, and articles are also described.