NOAH: Learning Pairwise Object Category Attentions for Image Classification

A modern deep neural network (DNN) for image classification tasks typically consists of two parts: a backbone for feature extraction, and a head for feature encoding and class predication. We observe that the head structures of mainstream DNNs adopt a similar feature encoding pipeline, exploiting gl...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Li, Chao, Zhou, Aojun, Yao, Anbang
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:A modern deep neural network (DNN) for image classification tasks typically consists of two parts: a backbone for feature extraction, and a head for feature encoding and class predication. We observe that the head structures of mainstream DNNs adopt a similar feature encoding pipeline, exploiting global feature dependencies while disregarding local ones. In this paper, we revisit the feature encoding problem, and propose Non-glObal Attentive Head (NOAH) that relies on a new form of dot-product attention called pairwise object category attention (POCA), efficiently exploiting spatially dense category-specific attentions to augment classification performance. NOAH introduces a neat combination of feature split, transform and merge operations to learn POCAs at local to global scales. As a drop-in design, NOAH can be easily used to replace existing heads of various types of DNNs, improving classification performance while maintaining similar model efficiency. We validate the effectiveness of NOAH on ImageNet classification benchmark with 25 DNN architectures spanning convolutional neural networks, vision transformers and multi-layer perceptrons. In general, NOAH is able to significantly improve the performance of lightweight DNNs, e.g., showing 3.14\%|5.3\%|1.9\% top-1 accuracy improvement to MobileNetV2 (0.5x)|Deit-Tiny (0.5x)|gMLP-Tiny (0.5x). NOAH also generalizes well when applied to medium-size and large-size DNNs. We further show that NOAH retains its efficacy on other popular multi-class and multi-label image classification benchmarks as well as in different training regimes, e.g., showing 3.6\%|1.1\% mAP improvement to large ResNet101|ViT-Large on MS-COCO dataset. Project page: https://github.com/OSVAI/NOAH.
DOI:10.48550/arxiv.2402.02377