Multiscale Modality-Similar Learning Guided Weakly Supervised RGB-T Crowd Counting

With the development of sensor technology and its numerous applications in intelligent surveillance systems, RGB-thermal (RGB-T) cross-modal crowd counting uses data from different sensors as source data and has received extensive attention from academia and industry. From the feature extraction asp...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE sensors journal 2024-09, Vol.24 (18), p.29121-29134
Hauptverfasser: Kong, Weihang, Li, He, Zhao, Fengda
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:With the development of sensor technology and its numerous applications in intelligent surveillance systems, RGB-thermal (RGB-T) cross-modal crowd counting uses data from different sensors as source data and has received extensive attention from academia and industry. From the feature extraction aspect, the existing cross-modal methods mainly adopt multiple parallel large convolution kernels for the notable crowd-scale variation problem, resulting in a large number of parameters. From the supervision aspect, the existing cross-modal crowd-counting methods adopt a fully supervised framework, and it requires time-consuming and laborious pixel-level supervision. In this regard, this article proposes a multiscale modality-similar guided weakly supervised cross-modal crowd-counting method, including a designed multiscale context-level feature fusion (MCFF) module and a modality-similar weakly supervised framework. In particular, the proposed multiscale module decouples the square convolution in different directions equivalently to solve the problems of feature redundancy and parameter increase. The proposed weakly supervised framework explores the similarity of cross-modal crowd semantic features to bootstrap the model with only image-level supervised information. Experimental results on two public RGB-T benchmarks, one RGB-D benchmark, and the collected real-world data show that the proposed weakly supervised method can achieve counting accuracy competitive with existing representative fully supervised methods. The extensive ablation studies validate the positive gain of the core modules on the final counting performance improvement.
ISSN:1530-437X
1558-1748
DOI:10.1109/JSEN.2024.3436859