ENTITY MATCHING WITH JOINT LEARNING OF BLOCKING AND MATCHING

A method of identifying entities from different data sources as matching entity pairs that refer to a same real-world object is provided. A set of labelling functions are provided to determine matching entities and non-matching entities of a source data set and a least one target data set. A subset...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: BAUER, Martin, CHENG, Bin, FUERST, Jonathan
Format: Patent
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:A method of identifying entities from different data sources as matching entity pairs that refer to a same real-world object is provided. A set of labelling functions are provided to determine matching entities and non-matching entities of a source data set and a least one target data set. A subset of labelling functions are selected from the provided set of labelling functions for training machine learning models for a blocking module that aims at filtering out as many unmatched entity pairs as possible without missing any true matches and for a matching module that aims at predicting matching results for remaining entity pairs not filtered out by the blocking module. Both a blocking model for the blocking module and a matching model are jointly learned for the matching module based on available unlabeled entity pairs and the labelling functions of the selected subset of labelling functions.