Identifying source datasets of transfer learning processes fitting target domains

A method for quantifying similarities between a target data set and a plurality of source data sets and identifying one or more source data sets that are most similar to the target data set is provided. The method includes receiving, at a computing system, a source data set related to a source domai...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: HAIM BEN, MENAHEM EITAN, FINKELSTEIN AMIT, AGMON NEAL
Format: Patent
Sprache:chi ; eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:A method for quantifying similarities between a target data set and a plurality of source data sets and identifying one or more source data sets that are most similar to the target data set is provided. The method includes receiving, at a computing system, a source data set related to a source domain and a target data set related to a target domain of interest. Each dataset is arranged in a table format comprising columns and rows, and the source dataset and the target dataset comprise the same feature space. The method also includes preprocessing, via a processor of the computing system, each source-target data set pair to remove disjoint columns. The method further includes calculating at least two of a dataset similarity score, a row similarity score, and a column similarity score for each source-target dataset pair, and aggregating the calculated similarity scores to identify one or more source datasets that are most similar to the target dataset. 提供了一种用于对目标数据集与多个源数据集之间的相似性进行量化并识别与所述目标数据集最相似的一个或多个源数据集的方法。