Deep Learning Approaches for Similarity Computation: A Survey
The requirement for appropriate ways to measure the similarity between data objects is a common but vital task in various domains, such as data mining, machine learning and so on. Driven by abundant real-world applications, many well-known similarity (distance) metrics are proposed to measure the pa...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on knowledge and data engineering 2024-12, Vol.36 (12), p.7893-7912 |
---|---|
Hauptverfasser: | , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The requirement for appropriate ways to measure the similarity between data objects is a common but vital task in various domains, such as data mining, machine learning and so on. Driven by abundant real-world applications, many well-known similarity (distance) metrics are proposed to measure the pairwise similarity of data pairs, e.g., graph edit distance for graphs and dynamic time warping for time series. However, many similarity metrics suffer from the high time complexity. More specifically, most of the well-known similarity metrics often need quadratic time or even much more time to compute the ground truth similarity and some of them are proven to be NP-hard. With the development of deep learning techniques, there is an emerging research trend on the learning for similarity computation on various data types in the field of database (DB) and data mining, which is quite different with the metric learning studies in the machine learning (ML) literature. Specifically, the studies in the ML focus on the learning for semantic similarity in specific tasks, which is implicitly indicated by the training data, on the data in the feature space. While the studies in the DB literature usually consider the learning for well-defined similarity metrics (e.g., graph edit distance) on the data objects (e.g., graphs), such that it can benefit the similarity computation on data in terms of multiple aspects, such as computation time, metric quality and search heuristic, and the learned representation of data can also be naturally fed to downstream tasks. This survey paper provides a comprehensive review of similarity computation learning on several data types, including set, sequence and graph. Moreover, we first classify the learning-based approaches in terms of their learning target into three categories, i.e., similarity learning, cost matrix learning and search heuristic learning. Then we detail some representative approaches for each category on every data type, and analyze some key features that are utilized by these approaches. Finally, we discuss some challenges and future directions towards the learning for similarity learning on these data types. |
---|---|
ISSN: | 1041-4347 1558-2191 |
DOI: | 10.1109/TKDE.2024.3422484 |