Crowd counting from single images using recursive multi-pathway zooming and foreground enhancement

•A Multi-Pathway Zooming Network is proposed, in which features at different resolutions are sequentially integrating and interacting during multi-glimpse behavior.•A foreground enhancement scheme is designed to alleviate the background noise and enhance counting performance.•The context information...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Pattern recognition 2023-09, Vol.141, p.109585, Article 109585
Hauptverfasser: Ma, Junjie, Dai, Yaping, Jia, Zhiyang, Sun, Fuchun, Tan, Yap-Peng, Liu, Jun
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:•A Multi-Pathway Zooming Network is proposed, in which features at different resolutions are sequentially integrating and interacting during multi-glimpse behavior.•A foreground enhancement scheme is designed to alleviate the background noise and enhance counting performance.•The context information, incorporated with the output density map and masked image, is recursively finetuned to boost the counting performance.•Multi-Pathway Zooming Network outperforms most of the state-of-the-art crowd counting methods on counting evaluation. Crowd counting is a challenging task due to many challenges such as scale variations and noisy background. To handle these challenges, we propose a novel framework named Multi-Pathway Zooming Network (MZNet) in this paper. The proposed framework recursively optimizes multi-scale features using multiple zooming pathways and progressively enhances the foreground information to improve crowd counting performance. Each zooming pathway comprises two zooming directions, zooming in and zooming out. Convolutional features at different resolutions are propagated to optimize the context information at each specific level. By sequentially integrating and interacting multi-observation information, the optimized features are powerful in handling the scale variation issue, and thus the crowd counting performance can be enhanced. To address the noisy background in many scenarios, we also introduce a new scheme to enhance the foreground information by incorporating a masked input image into the network, which is formed by a mask that element-wise multiplies with the original image. Finally, the context information, incorporated with an output density map, is recursively finetuned in our network to boost the counting performance. Extensive experiments evaluated on challenging benchmark datasets show competitive performances for both crowded and sparse scenarios.
ISSN:0031-3203
1873-5142
DOI:10.1016/j.patcog.2023.109585