Semantic image representation for image recognition and retrieval using multilayer variational auto-encoder, InceptionNet and low-level image features

This paper presents a novel image descriptor that enhances performance in image recognition and retrieval by combining deep learning and handcrafted features. Our method integrates high-level semantic features extracted via InceptionResNet-V2 with color and texture features to create a comprehensive...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	The Journal of supercomputing 2025, Vol.81 (1), Article 346
Hauptverfasser:	Giveki, Davar, Esfandyari, Sajad
Format:	Artikel
Sprache:	eng
Schlagworte:	Classification Compilers Computer Science Computer vision Feature extraction Interpreters Labels Machine learning Multilayers Processor Architectures Programming Languages Representations Retrieval Semantics Texture recognition
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	This paper presents a novel image descriptor that enhances performance in image recognition and retrieval by combining deep learning and handcrafted features. Our method integrates high-level semantic features extracted via InceptionResNet-V2 with color and texture features to create a comprehensive representation of image content. The descriptor’s effectiveness is demonstrated through extensive experiments across a range of image recognition and retrieval tasks. Our approach is tested on six benchmark datasets, including Corel-1 K, VS, OT, QT, SUN-397, and ILSVRC-2012 for single-label classification, and COCO and NUS-WIDE for multi-label classification, achieving high performances. The results establish that the proposed method is versatile and robust, excelling in single-label and multi-label recognition as well as image retrieval tasks, and outperforms several state-of-the-art methods. This work provides a significant advancement in image representation, with broad applicability in various computer vision domains.
ISSN:	0920-8542 1573-0484
DOI:	10.1007/s11227-024-06792-5