Decomposing generation networks with structure prediction for recipe generation

•We divide the recipes into phases, and use a global structure prediction component to assign different subgenerators to generate recipe phases.•We incorporate the attention mechanism to get the phase-aware features, which are the input for different subgenerators to produce better recipe phases.•Ou...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Pattern recognition 2022-06, Vol.126, p.108578, Article 108578
Hauptverfasser:	Wang, Hao, Lin, Guosheng, Hoi, Steven C.H., Miao, Chunyan
Format:	Artikel
Sprache:	eng
Schlagworte:	Text generation Vision-and-language
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	•We divide the recipes into phases, and use a global structure prediction component to assign different subgenerators to generate recipe phases.•We incorporate the attention mechanism to get the phase-aware features, which are the input for different subgenerators to produce better recipe phases.•Our proposed framework outperforms previous state-of-the-art results. Recipe generation from food images and ingredients is a challenging task, which requires the interpretation of the information from another modality. Different from the image captioning task, where the captions usually have one sentence, cooking instructions contain multiple sentences and have obvious structures. To help the model capture the recipe structure and avoid missing some cooking details, we propose a novel framework: Decomposing Generation Networks (DGN) with structure prediction, to get more structured and complete recipe generation outputs. Specifically, we split each cooking instruction into several phases, and assign different sub-generators to each phase. Our approach includes two novel ideas: (i) learning the recipe structures with the global structure prediction component and (ii) producing recipe phases in the sub-generator output component based on the predicted structure. Extensive experiments on the challenging large-scale Recipe1M dataset validate the effectiveness of our proposed model, which improves the performance over the state-of-the-art results.
ISSN:	0031-3203 1873-5142
DOI:	10.1016/j.patcog.2022.108578