CC-DETR: DETR with Hybrid Context and Multi-Scale Coordinate Convolution for Crowd Counting

Prevailing crowd counting approaches primarily rely on density map regression methods. Despite wonderful progress, significant scale variations and complex background interference within the same image remain challenges. To address these issues, in this paper we propose a novel DETR-based crowd coun...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Mathematics (Basel) 2024-05, Vol.12 (10), p.1562
Hauptverfasser: Gu, Yanhong, Zhang, Tao, Hu, Yuxia, Nian, Fudong
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Prevailing crowd counting approaches primarily rely on density map regression methods. Despite wonderful progress, significant scale variations and complex background interference within the same image remain challenges. To address these issues, in this paper we propose a novel DETR-based crowd counting framework called Crowd Counting DETR (CC-DETR), which aims to extend the state-of-the-art DETR object detection framework to the crowd counting task. In CC-DETR, a DETR-like encoder–decoder structure (Hybrid Context DETR, i.e., HCDETR) is proposed to tackle complex visual information by fusing features from hybrid semantic levels through a transformer. In addition, we design a Coordinate Dilated Convolution Module (CDCM) to effectively employ position-sensitive context information in different scales. Extensive experiments on three challenging crowd counting datasets (ShanghaiTech, UCF-QNRF, and NWPU) demonstrate that our model is effective and competitive when compared against SOTA crowd counting models.
ISSN:2227-7390
2227-7390
DOI:10.3390/math12101562