M2FN: Multi-step modality fusion for advertisement image assessment

Assessing advertisements, specifically on the basis of user preferences and ad quality, is crucial to the marketing industry. Although recent studies have attempted to use deep neural networks for this purpose, these studies have not utilized image-related auxiliary attributes, which include embedde...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Applied soft computing 2021-05, Vol.103, p.107116, Article 107116
Hauptverfasser: Park, Kyung-Wha, Ha, Jung-Woo, Lee, JungHoon, Kwon, Sunyoung, Kim, Kyung-Min, Zhang, Byoung-Tak
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Assessing advertisements, specifically on the basis of user preferences and ad quality, is crucial to the marketing industry. Although recent studies have attempted to use deep neural networks for this purpose, these studies have not utilized image-related auxiliary attributes, which include embedded text frequently found in ad images. We, therefore, investigated the influence of these attributes on ad image preferences. First, we analyzed large-scale real-world ad log data and, based on our findings, proposed a novel multi-step modality fusion network (M2FN) that determines advertising images likely to appeal to user preferences. Our method utilizes auxiliary attributes through multiple steps in the network, which include conditional batch normalization-based low-level fusion and attention-based high-level fusion. We verified M2FN on the AVA dataset, which is widely used for aesthetic image assessment, and then demonstrated that M2FN can achieve state-of-the-art performance in preference prediction using a real-world ad dataset with rich auxiliary attributes. [Display omitted] •We propose a novel deep neural networks model for advertisement image assessment.•It fuses modalities via conditional batch normalization and an attention mechanism.•We leverage visual--linguistic attributes along with classic metadata.•We show state-of-the-art results for aesthetic and advertisement image assessment.•We show the results of ablation studies and visualization to support our claims.
ISSN:1568-4946
1872-9681
DOI:10.1016/j.asoc.2021.107116