Inferring database relationships

A computer implemented method of identifying one or more relationships between columns in a database, each column having associated a column name, the method comprising: tokenising the column names of each column in the database; identifying one or more relationships between columns based on one or...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: John McCall, Benjamin Lacroix, Mathias Kern, Akinola Ogunsemi, Gilbert Owusu, David Corsar
Format: Patent
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:A computer implemented method of identifying one or more relationships between columns in a database, each column having associated a column name, the method comprising: tokenising the column names of each column in the database; identifying one or more relationships between columns based on one or more correlations between pairs of tokenised column names. The column name may be tokenised by identifying a plurality of substrings in the column name, the substrings may be identified based on a substring delimiter within the column name. The correlations may include one or more of: a term frequency document frequency (TDIFD) correlation, a Pearson correlation coefficient computation, a determination of the degree of literal similarity between at least part of column names based on the tokenised column names, the degree of literal similarity meeting a pre-determined threshold, a determination of a degree of phonetic similarity meeting a predetermined threshold and more. The method may also involve evaluating a distance measure for each pair of related columns and applying a clustering algorithm based on the distance measures to identify one or more clusters of pairs of columns having a degree of similarity.