Identifying source datasets that fit transfer learning process for target domain

A method for quantifying a similarity between a target dataset and multiple source datasets and identifying one or more source datasets that are most similar to the target dataset is provided. The method includes receiving, at a computing system, source datasets relating to a source domain and a tar...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Bar Haim, Andrey Finkelshtein, Eitan Menahem, Noga Agmon
Format: Patent
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:A method for quantifying a similarity between a target dataset and multiple source datasets and identifying one or more source datasets that are most similar to the target dataset is provided. The method includes receiving, at a computing system, source datasets relating to a source domain and a target dataset relating to a target domain of interest. Each dataset is arranged in a tabular format including columns and rows, and the source datasets and the target dataset include a same feature space. The method also includes pre-processing, via a processor of the computing system, each source-target dataset pair to remove non-intersecting columns. The method further includes calculating at least two of a dataset similarity score, a row similarity score, and a column similarity score for each source-target dataset pair, and summarizing the calculated similarity scores to identify one or more source datasets that are most similar to the target dataset.