DATA CONTENT IDENTIFICATION
The subject matter disclosed herein provides methods for identifying the type of content found in a database or source file having data records. A source file having one or more data records may be accessed. The data records may be associated with one or more data values arranged into columns. One o...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Patent |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The subject matter disclosed herein provides methods for identifying the type of content found in a database or source file having data records. A source file having one or more data records may be accessed. The data records may be associated with one or more data values arranged into columns. One or more data types may be proposed for at least one column by examining the data values in the column. A confidence score may be calculated for each proposed data type. The proposed data types may be arranged into a prioritized list based on each data type's confidence score. One or more rules may be applied to the column to finalize priorities of the proposed data types. The rules may be applied without referring to the data values in the column. Results may be provided based on the finalized priorities. Related apparatus, systems, techniques, and articles are also described. |
---|