Scalable Feature Matching Across Large Data Collections
This paper is concerned with matching feature vectors in a one-to-one fashion across large collections of datasets. Formulating this task as a multidimensional assignment problem with decomposable costs (MDADC), we develop extremely fast algorithms with time complexity linear in the number $n$ of da...
Gespeichert in:
1. Verfasser: | |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | This paper is concerned with matching feature vectors in a one-to-one fashion
across large collections of datasets. Formulating this task as a
multidimensional assignment problem with decomposable costs (MDADC), we develop
extremely fast algorithms with time complexity linear in the number $n$ of
datasets and space complexity a small fraction of the data size. These
remarkable properties hinge on using the squared Euclidean distance as
dissimilarity function, which can reduce ${n \choose 2}$ matching problems
between pairs of datasets to $n$ problems and enable calculating assignment
costs on the fly. To our knowledge, no other method applicable to the MDADC
possesses these linear scaling and low-storage properties necessary to
large-scale applications. In numerical experiments, the novel algorithms
outperform competing methods and show excellent computational and optimization
performances. An application of feature matching to a large neuroimaging
database is presented. The algorithms of this paper are implemented in the R
package matchFeat available at https://github.com/ddegras/matchFeat. |
---|---|
DOI: | 10.48550/arxiv.2101.02035 |