Cross-layer progressive attention bilinear fusion method for fine-grained visual classification
•We proposed the Cross-layer Attention, which selects the high-level and low-level features in the backbone network to localize objects accurately for strengthening discriminative features.•We proposed the Cross-Layer Bilinear Fusion Module, which multiplies the features from different layers in a b...
Gespeichert in:
Veröffentlicht in: | Journal of visual communication and image representation 2022-01, Vol.82, p.103414, Article 103414 |
---|---|
Hauptverfasser: | , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | •We proposed the Cross-layer Attention, which selects the high-level and low-level features in the backbone network to localize objects accurately for strengthening discriminative features.•We proposed the Cross-Layer Bilinear Fusion Module, which multiplies the features from different layers in a bilinear manner. And the obtained features are merged into the last features of the backbone.•We divided the training process into stages and adjust the parameters through Progressive Training to ensure that the parameters in each stage will be adjusted to the best state.•We have conducted extensive experiments on four challenging datasets and achieved state-of-the-art performance.•The CPABF method improves feature representation ability and provides a new way of thinking for FGVC tasks and related tasks in computer vision.
Fine-grained visual classification (FGVC) is a critical task in the field of computer vision. However, FGVC is full of challenges due to the large intra-class variation and small inter-class variation of the classes to be classified on an image. The key in dealing with the problem is to capture subtle visual differences from the image and effectively represent the discriminative features. Existing methods are often limited by insufficient localization accuracy and insufficient feature representation capabilities. In this paper, we propose a cross-layer progressive attention bilinear fusion (CPABF in short) method, which can efficiently express the characteristics of discriminative regions. The CPABF method involves three components: 1) Cross-Layer Attention (CLA) locates and reinforces the discriminative region with low computational costs; 2) The Cross-Layer Bilinear Fusion Module (CBFM) effectively integrates the semantic information from the low-level to the high-level 3) Progressive Training optimizes the parameters in the network to the best state in a delicate way. The CPABF shows excellent performance on the four FGVC datasets and outperforms some state-of-the-art methods. |
---|---|
ISSN: | 1047-3203 1095-9076 |
DOI: | 10.1016/j.jvcir.2021.103414 |