A fast and efficient large-scale near duplicate image retrieval system using double perceptual hashing
With the ever-increasing volume of digital images available online, it has become important to identify similar images quickly and accurately across a variety of domains. Perceptual hashing is known to be the most widely used method for such near-duplicate image retrieval. While content-based featur...
Gespeichert in:
Veröffentlicht in: | Signal, image and video processing image and video processing, 2024-12, Vol.18 (12), p.8565-8575 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | With the ever-increasing volume of digital images available online, it has become important to identify similar images quickly and accurately across a variety of domains. Perceptual hashing is known to be the most widely used method for such near-duplicate image retrieval. While content-based features provide superior accuracy in detecting similar images, using hash codes derived from these features reduces storage requirements and improves time efficiency. However, as the image volume increases, the computational complexity of perceptual hashing poses a challenge. Another significant challenge is the robustness of perceptual hash functions against adversarial manipulations. To deal with these issues and to improve the accuracy of near duplicate image retrieval, this paper proposes a double perceptual hashing approach. Here, the primary hash performs a coarse matching and retrieves all the relevant images to the query image. Subsequently, a secondary hash performs fine matching by eliminating false positive images identified by the primary hash. While dual hash functions enhance robustness, another novel strategy of partitioning the primary hash into equal-sized segments boosts storage efficiency and accelerates the search speed by over tenfold compared to the naive approach. Experimental results using Copydays dataset augmented with 30,000 random images show average mAP and response time of 0.89 and 0.101sec respectively verifying its efficiency on large datasets. |
---|---|
ISSN: | 1863-1703 1863-1711 |
DOI: | 10.1007/s11760-024-03490-w |