Bi-Modal Learning With Channel-Wise Attention for Multi-Label Image Classification

Multi-label image classification is more in line with the real-world applications. This problem is difficult due to the the fact that complex label space makes it hard to get label-level attention regions and deal with semantic relationships among labels. Common deep network-based methods utilize CN...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE access 2020, Vol.8, p.9965-9977
Hauptverfasser: Li, Peng, Chen, Peng, Xie, Yonghong, Zhang, Dezheng
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Multi-label image classification is more in line with the real-world applications. This problem is difficult due to the the fact that complex label space makes it hard to get label-level attention regions and deal with semantic relationships among labels. Common deep network-based methods utilize CNN to extract features and consider the labels as a sequence or a graph, thus handling the label correlations with RNN or graph-theoretical algorithms. In this paper, we propose a novel CNN-RNN-based model, bi-modal multi-label learning(BMML) framework. Firstly, an improved channel-wise attention mechanism is presented to propose regional attention maps and connect them to relative labels. After that, based on the assumption that objects in a semantic scene always have high-level relevance among visual and textual corpus, we further embed the labels through different pre-trained language models and determine the label sequence in a "semantic space" constructed on large-scale textual data, thereby handling the labels in their semantic context. In addition, a cross-modal feature aligning module is introduced in BMML framework. Experimental results show that BMML is able to achieve better accuracies then those mainstream multi-label classification methods on several benchmark data sets.
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2020.2964599