Multimodal Retrieval using Mutual Information based Textual Query Reformulation

•Multimodal Retrieval efficiency can be improved by textual query reformulation.•A graph based keyphrase extraction incorporating correlation of terms is proposed.•Textual query is expanded with relevant part of narratives and extracted keyphrases.•Text and image features are combined using a weight...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Expert systems with applications 2017-02, Vol.68, p.81-92
Hauptverfasser: Datta, Deepanwita, Varma, Shubham, Chowdary C., Ravindranath, Singh, Sanjay K.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:•Multimodal Retrieval efficiency can be improved by textual query reformulation.•A graph based keyphrase extraction incorporating correlation of terms is proposed.•Textual query is expanded with relevant part of narratives and extracted keyphrases.•Text and image features are combined using a weightlearning model.•The proposed method improves both image and text retrieval efficiency significantly. Multimodal Retrieval is a well-established approach for image retrieval. Usually, images are accompanied by text caption along with associated documents describing the image. Textual query expansion as a form of enhancing image retrieval is a relatively less explored area. In this paper, we first study the effect of expanding textual query on both image and its associated text retrieval. Our study reveals that judicious expansion of textual query through keyphrase extraction can lead to better results, either in terms of text-retrieval or both image and text-retrieval. To establish this, we use two well-known keyphrase extraction techniques based on tf-idf and KEA. While query expansion results in increased retrieval efficiency, it is imperative that the expansion be semantically justified. So, we propose a graph-based keyphrase extraction model that captures the relatedness between words in terms of both mutual information and relevance feedback. Most of the existing works have stressed on bridging the semantic gap by using textual and visual features, either in combination or individually. The way these text and image features are combined determines the efficacy of any retrieval. For this purpose, we adopt Fisher-LDA to adjudge the appropriate weights for each modality. This provides us with an intelligent decision-making process favoring the feature set to be infused into the final query. Our proposed algorithm is shown to supersede the previously mentioned keyphrase extraction algorithms for query expansion significantly. A rigorous set of experiments performed on ImageCLEF-2011 Wikipedia Retrieval task dataset validates our claim that capturing the semantic relation between words through Mutual Information followed by expansion of a textual query using relevance feedback can simultaneously enhance both text and image retrieval.
ISSN:0957-4174
1873-6793
DOI:10.1016/j.eswa.2016.09.039