Gated weighted normative feature fusion for multispectral object detection

Multispectral image pairs can provide independent and complementary information to more comprehensively describe detection targets, thereby improving the robustness and reliability of object detectors. The performance of an object detector depends on how cross-modality features are extracted and fus...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	The Visual computer 2024-09, Vol.40 (9), p.6409-6419
Hauptverfasser:	Wu, Xianjun, Jiang, Xian, Dong, Ligang
Format:	Artikel
Sprache:	eng
Schlagworte:	Ablation Accuracy Artificial Intelligence Computer Graphics Computer Science Datasets Effectiveness Efficiency Feature extraction Image Processing and Computer Vision Methods Neural networks Object recognition Optimization Original Article Pedestrians Sensors Target detection
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Multispectral image pairs can provide independent and complementary information to more comprehensively describe detection targets, thereby improving the robustness and reliability of object detectors. The performance of an object detector depends on how cross-modality features are extracted and fused. To exploit the different modalities fully, we propose a lightweight yet effective cross-modality feature fusion approach named gated weighted normative feature fusion. In the feature extraction stage, our proposed dual-input backbone network can extract richer and more useful features. In the feature fusion stage, the fusion module can eliminate redundant features, dynamically weigh the importance of two image features, and further normalize fused features. Experiments and ablation studies on several publicly available datasets demonstrate the effectiveness of our method. Our proposed method achieved better performance in terms of mAP50 with 80.3%, mAP with 41.8%, and mAP50 with 98.0%, mAP with 68.0% on the FLIR and LLVIP datasets, respectively. In particular, the inference speed of our method is twice as fast as the current state-of-the-art method.
ISSN:	0178-2789 1432-2315
DOI:	10.1007/s00371-023-03173-6