Inferring database relationships

A computer implemented method of identifying one or more relationships between columns in a database, each column having associated a column name, the method comprising: tokenising the column names of each column in the database; identifying one or more relationships between columns based on one or...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	John McCall, Benjamin Lacroix, Mathias Kern, Akinola Ogunsemi, Gilbert Owusu, David Corsar
Format:	Patent
Sprache:	eng
Schlagworte:	CALCULATING COMPUTING COUNTING ELECTRIC DIGITAL DATA PROCESSING PHYSICS
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	A computer implemented method of identifying one or more relationships between columns in a database, each column having associated a column name, the method comprising: tokenising the column names of each column in the database; identifying one or more relationships between columns based on one or more correlations between pairs of tokenised column names. The column name may be tokenised by identifying a plurality of substrings in the column name, the substrings may be identified based on a substring delimiter within the column name. The correlations may include one or more of: a term frequency document frequency (TDIFD) correlation, a Pearson correlation coefficient computation, a determination of the degree of literal similarity between at least part of column names based on the tokenised column names, the degree of literal similarity meeting a pre-determined threshold, a determination of a degree of phonetic similarity meeting a predetermined threshold and more. The method may also involve evaluating a distance measure for each pair of related columns and applying a clustering algorithm based on the distance measures to identify one or more clusters of pairs of columns having a degree of similarity.