Crowd Counting via Unsupervised Cross-Domain Feature Adaptation

Given an image, crowd counting aims to estimate the amount of target objects in the image. With un-predictable installation situations of surveillance systems (or other equipments), crowd counting images from different data sets may exhibit severe discrepancies in viewing angle, scale, lighting cond...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on multimedia 2023, Vol.25, p.4665-4678
Hauptverfasser:	Ding, Guanchen, Yang, Daiqin, Wang, Tao, Wang, Sihan, Zhang, Yunfei
Format:	Artikel
Sprache:	eng
Schlagworte:	Adaptation Adaptation models adversarial learning Algorithms Annotations Bridges Crowd monitoring Data models Datasets density map estimation Discriminators domain adaptation Feature extraction Modules Semantics Surveillance systems Training Unsupervised crowd counting Unsupervised learning
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Given an image, crowd counting aims to estimate the amount of target objects in the image. With un-predictable installation situations of surveillance systems (or other equipments), crowd counting images from different data sets may exhibit severe discrepancies in viewing angle, scale, lighting condition, etc. As it is usually expensive and time-consuming to annotate each data set for model training, it has been an essential issue in crowd counting to transfer a well-trained model on a labeled data set (source domain) to a new data set (target domain). To tackle this problem, we propose a cross-domain learning network to learn the domain gaps in an unsupervised learning manner. The proposed network comprises of a Multi-granularity Feature-aware Discriminator (MFD) module, a Domain-invariant Feature Adaptation (DFA) module, and a Cross-domain Vanishing Bridge (CVB) module to remove domain-specific information from the extracted features and promote the mapping performances of the network. Unlike most existing methods that use only Global Feature Discriminator (GFD) to align features at image level, an additional Local Feature Discriminator (LFD) is inserted and together with GFD form the MFD module. As a complement to MFD, LFD refines features at pixel level and has the ability to align local features. The DFA module explicitly measures the distances between the source domain features and the target domain features and aligns the marginal distribution of their features with Maximum Mean Discrepancy (MMD). Finally, the CVB module provides an incremental capability of removing the impact of interfering part of the extracted features. Several well-known networks are adopted as the backbone of our algorithm to prove the effectiveness of the proposed adaptation structure. Comprehensive experiments demonstrate that our model achieves competitive performance to the state-of-the-art methods.
ISSN:	1520-9210 1941-0077
DOI:	10.1109/TMM.2022.3180222