Pedestrian detection network with multi-modal cross-guided learning

Most of the current feature generation modules based on infrared and visible modal are independent of each other, lacking long-term dependence between the modalities. It results in large differences between different modal features, which affects fusion effects, and leads to false and miss detection...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Digital signal processing 2022-04, Vol.122, p.103370, Article 103370
Hauptverfasser:	Hua, ChunJian, Sun, MingChun, Zhu, Yu, Jiang, Yi, Yu, JianFeng, Chen, Ying
Format:	Artikel
Sprache:	eng
Schlagworte:	Feature fusion Multi-modal interaction Multispectral information Pedestrian detection
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Most of the current feature generation modules based on infrared and visible modal are independent of each other, lacking long-term dependence between the modalities. It results in large differences between different modal features, which affects fusion effects, and leads to false and miss detection of targets. To tackle the problem, a pedestrian detection network with multi-modal cross-guided learning was proposed. First, the paired multi-modal images were sent to the feature generation module to generate deep and shallow features. Starting from the middle stage, the paired multi-modal features were sent to the designed weight-aware module, which output the weighted features of each modal, together with the fusion features. Then the weighted features of each modal were returned to the feature generation module of another modal, which enabled the weighted information gradually transmitted to the next stage in a joint cross-guidance manner to establish long-term dependence between modalities. At the same time, the fusion features were also input to the weight-aware module of the next stage to strengthen the connection between the fusion features at different stages and obtain more discriminative features. Finally, both the modal-specific features and the fusion features were sent to the detection module to generate the position and classification score of pedestrian targets. The experimental results indicated that the average precision on Kaist multispectral pedestrian detection dataset reached 77.16%, and the log-average miss rate dropped to 25.03% which reduced by 29.77% compared with the baseline.
ISSN:	1051-2004 1095-4333
DOI:	10.1016/j.dsp.2021.103370