Being a Supercook: Joint Food Attributes and Multimodal Content Modeling for Recipe Retrieval and Exploration

This paper considers the problem of recipe-oriented image-ingredient correlation learning with multi-attributes for recipe retrieval and exploration. Existing methods mainly focus on food visual information for recognition while we model visual information, textual content (e.g., ingredients), and a...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on multimedia 2017-05, Vol.19 (5), p.1100-1113
Hauptverfasser:	Weiqing Min, Shuqiang Jiang, Jitao Sang, Huayang Wang, Xinda Liu, Herranz, Luis
Format:	Artikel
Sprache:	eng
Schlagworte:	Correlation Cuisine classification Exploration Image classification Image enhancement Image management Image recognition Image retrieval ingredient inference Ingredients Learning multitask deep belief network Oils Planning recipe image retrieval Recipes Solid modeling Visualization
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	This paper considers the problem of recipe-oriented image-ingredient correlation learning with multi-attributes for recipe retrieval and exploration. Existing methods mainly focus on food visual information for recognition while we model visual information, textual content (e.g., ingredients), and attributes (e.g., cuisine and course) together to solve extended recipe-oriented problems, such as multimodal cuisine classification and attribute-enhanced food image retrieval. As a solution, we propose a multimodal multitask deep belief network (M 3 TDBN) to learn joint image-ingredient representation regularized by different attributes. By grouping ingredients into visible ingredients (which are visible in the food image, e.g., "chicken" and "mushroom") and nonvisible ingredients (e.g., "salt" and "oil"), M 3 TDBN is capable of learning both midlevel visual representation between images and visible ingredients and nonvisual representation. Furthermore, in order to utilize different attributes to improve the intermodality correlation, M 3 TDBN incorporates multitask learning to make different attributes collaborate each other. Based on the proposed M 3 TDBN, we exploit the derived deep features and the discovered correlations for three extended novel applications: 1) multimodal cuisine classification; 2) attribute-augmented cross-modal recipe image retrieval; and 3) ingredient and attribute inference from food images. The proposed approach is evaluated on the constructed Yummly dataset and the evaluation results have validated the effectiveness of the proposed approach.
ISSN:	1520-9210 1941-0077
DOI:	10.1109/TMM.2016.2639382