CrowdFormer: Weakly-supervised crowd counting with improved generalizability

Convolutional neural networks (CNNs) have dominated the field of computer vision for nearly a decade. However, due to their limited receptive field, CNNs fail to model the global context. On the other hand, transformers, an attention-based architecture, can model the global context easily. Despite t...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of visual communication and image representation 2023-06, Vol.94, p.103853, Article 103853
Hauptverfasser: Savner, Siddharth Singh, Kanhangad, Vivek
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Convolutional neural networks (CNNs) have dominated the field of computer vision for nearly a decade. However, due to their limited receptive field, CNNs fail to model the global context. On the other hand, transformers, an attention-based architecture, can model the global context easily. Despite this, there are limited studies that investigate the effectiveness of transformers in crowd counting. In addition, the majority of the existing crowd-counting methods are based on the regression of density maps which requires point-level annotation of each person present in the scene. This annotation task is laborious and also error-prone. This has led to an increased focus on weakly-supervised crowd-counting methods, which require only count-level annotations. In this paper, we propose a weakly-supervised method for crowd counting using a pyramid vision transformer. We have conducted extensive evaluations to validate the effectiveness of the proposed method. Our method achieves state-of-the-art performance. More importantly, it shows remarkable generalizability. •A pyramid vision transformer-based weakly-supervised crowd counting method.•An MLP based multiscale feature aggregation and regression module.•New benchmarks for cross-dataset performance in crowd counting.•A novel end-to-end attention-based crowd-counting method employing smooth-L1 loss.
ISSN:1047-3203
1095-9076
DOI:10.1016/j.jvcir.2023.103853