NOAH: Learning Pairwise Object Category Attentions for Image Classification
A modern deep neural network (DNN) for image classification tasks typically consists of two parts: a backbone for feature extraction, and a head for feature encoding and class predication. We observe that the head structures of mainstream DNNs adopt a similar feature encoding pipeline, exploiting gl...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | A modern deep neural network (DNN) for image classification tasks typically
consists of two parts: a backbone for feature extraction, and a head for
feature encoding and class predication. We observe that the head structures of
mainstream DNNs adopt a similar feature encoding pipeline, exploiting global
feature dependencies while disregarding local ones. In this paper, we revisit
the feature encoding problem, and propose Non-glObal Attentive Head (NOAH) that
relies on a new form of dot-product attention called pairwise object category
attention (POCA), efficiently exploiting spatially dense category-specific
attentions to augment classification performance. NOAH introduces a neat
combination of feature split, transform and merge operations to learn POCAs at
local to global scales. As a drop-in design, NOAH can be easily used to replace
existing heads of various types of DNNs, improving classification performance
while maintaining similar model efficiency. We validate the effectiveness of
NOAH on ImageNet classification benchmark with 25 DNN architectures spanning
convolutional neural networks, vision transformers and multi-layer perceptrons.
In general, NOAH is able to significantly improve the performance of
lightweight DNNs, e.g., showing 3.14\%|5.3\%|1.9\% top-1 accuracy improvement
to MobileNetV2 (0.5x)|Deit-Tiny (0.5x)|gMLP-Tiny (0.5x). NOAH also generalizes
well when applied to medium-size and large-size DNNs. We further show that NOAH
retains its efficacy on other popular multi-class and multi-label image
classification benchmarks as well as in different training regimes, e.g.,
showing 3.6\%|1.1\% mAP improvement to large ResNet101|ViT-Large on MS-COCO
dataset. Project page: https://github.com/OSVAI/NOAH. |
---|---|
DOI: | 10.48550/arxiv.2402.02377 |