A Lightweight Object Counting Network Based On Density Map Knowledge Distillation

Object counting aims to count the accurate number of object instances in images, and its operation efficiency is essential. However, most current CNN-based methods rely on complex network architectures, which results in them consuming a significant amount of memory, time, and other resources at runt...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on circuits and systems for video technology 2024-09, p.1-1
Hauptverfasser: Shen, Zhilong, Li, Guoquan, Xia, Ruiyang, Meng, Hongying, Huang, Zhengwen
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Object counting aims to count the accurate number of object instances in images, and its operation efficiency is essential. However, most current CNN-based methods rely on complex network architectures, which results in them consuming a significant amount of memory, time, and other resources at runtime. This seriously limits their deployment in practical application scenarios, such as public safety and agriculture planting. Therefore, we propose a lightweight object counting method named EdgeCount to effectively balance inference speed and object counting accuracy. Specifically, we construct a network composed of a student model (EdgeCount) and a teacher model (EdgeCount-T) with the same encoder-decoder structure based on density map knowledge distillation (DMKD), allowing the EdgeCount to learn object density distribution from the EdgeCount-T. After that, we introduce spatial and channel reconstruction convolution (SCConv), composed of a spatial reconstruction unit (SRU) and a channel reconstruction unit (CRU), to decrease spatial and channel redundancy with lower computational costs. Moreover, a low parameter weighted multi-scale feature fusion module (LWMFFM) is designed to further improve the countering ability through segmenting minor structural discrepacies among multi-scale features. Extensive experiments conducted on challenging remote sensing and dense crowd object counting datasets demonstrate the effectiveness and superiority of our method. In particular, under the four NVIDIA Jetson devices, EdgeCount can accurately counter objects with only 0.12M parameters and 19.87M floating-point operations per second (FLOPs) in the size of 128, which achieves the lowest latency and fastest FPS compared with other state-of-the-art object counters.
ISSN:1051-8215
1558-2205
DOI:10.1109/TCSVT.2024.3469933