Detecting Near Duplicate Dataset with Machine Learning

This paper introduces the concept of near duplicate dataset, a quasi-duplicate version of a dataset. This version has undergone an unknown number of row and column insertions and deletions (modifications on schema and instance). This concept is interesting for data exploration, data integration and...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:International journal of computer information systems and industrial management applications 2022, Vol.14, p.374-385
Hauptverfasser: Chevallier, Marc, Rogovschi, Nicoleta, Boufarès, Faouzi, Grozavu, Nistor, Clairmont, Charly
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!