Deep-view linguistic and inductive learning (DvLIL) based framework for Image Retrieval
The presence of abundant data over the Internet leads to complex issues when it comes to retrieving desired information from such a large volume of content. There are situations where a user needs to modify desired information across different modalities. One common example is retrieving a desired p...
Gespeichert in:
Veröffentlicht in: | Information sciences 2023-11, Vol.649, p.119641, Article 119641 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The presence of abundant data over the Internet leads to complex issues when it comes to retrieving desired information from such a large volume of content. There are situations where a user needs to modify desired information across different modalities. One common example is retrieving a desired product from inventory of online commercial platforms. In such cases, a dual-modality-based Content-Based Image Retrieval (CBIR) system plays a key role in facilitating communication between the user and the agent. This research proposes a framework that is built for the retrieval of desired images with modified features. The proposed framework is based on the extraction of image and text features, followed by their combined representation through inductive learning. It learns deep insights of visual features, which are then modified by linguistic semantics. State-of-the-art deep learning techniques are employed for dense representation of both image and text features. After successfully representing the image and text queries, their combined representation is learned using a sequence of MLP (multi-layer perceptrons). The proposed approach outperformed on real-time benchmark datasets, Fashion-200K and MIT-States. |
---|---|
ISSN: | 0020-0255 1872-6291 |
DOI: | 10.1016/j.ins.2023.119641 |