Dual-branch counting method for dense crowd based on self-attention mechanism

A dense crowd counting method based on self-attention mechanism with dual-branch fusion network is proposed in this paper. Our method aims to address the problems of large variations in head scales and complex backgrounds in dense crowd images. This method combines the CNN and Transformer network fr...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Expert systems with applications 2024-02, Vol.236, p.121272, Article 121272
Hauptverfasser: Wang, Yongjie, Wang, Feng, Huang, Dongyang
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:A dense crowd counting method based on self-attention mechanism with dual-branch fusion network is proposed in this paper. Our method aims to address the problems of large variations in head scales and complex backgrounds in dense crowd images. This method combines the CNN and Transformer network frameworks and consists of shallow feature extraction network, dual-branch fusion network, and deep feature extraction network. The VGG16 network is employed by the shallow feature extraction network to extract low-level features. A multi-scale CNN branch and a Transformer branch built on an improved self-attention module make up the dual-branch fusion network, which collects local and global information on crowd areas, respectively. The Transformer network, which is based on a mixed attention module, is employed by the deep feature extraction network to further separate complicated backgrounds and concentrate on crowd areas. Both counting-level weakly supervised and location-level fully supervised methods are employed in the experiments. On four widely used datasets, the results demonstrate that the proposed method outperforms the most recent research. Our method has a higher counting accuracy with low parameter volumes and a counting accuracy of 89.1% under full supervision when compared to existing weakly supervised methods. The results of the experiments demonstrate that the method has excellent crowd counting performance and can accurately count in high-density and high-occlusion scenes.
ISSN:0957-4174
1873-6793
DOI:10.1016/j.eswa.2023.121272