Inferring database relationships
A computer implemented method of identifying one or more relationships between columns in a database, each column having associated a column name, the method comprising: tokenising the column names of each column in the database; identifying one or more relationships between columns based on one or...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Patent |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | A computer implemented method of identifying one or more relationships between columns in a database, each column having associated a column name, the method comprising: tokenising the column names of each column in the database; identifying one or more relationships between columns based on one or more correlations between pairs of tokenised column names. The column name may be tokenised by identifying a plurality of substrings in the column name, the substrings may be identified based on a substring delimiter within the column name. The correlations may include one or more of: a term frequency document frequency (TDIFD) correlation, a Pearson correlation coefficient computation, a determination of the degree of literal similarity between at least part of column names based on the tokenised column names, the degree of literal similarity meeting a pre-determined threshold, a determination of a degree of phonetic similarity meeting a predetermined threshold and more. The method may also involve evaluating a distance measure for each pair of related columns and applying a clustering algorithm based on the distance measures to identify one or more clusters of pairs of columns having a degree of similarity. |
---|