Rethinking Self-Attention for Multispectral Object Detection
Data from different modalities, such as infrared and visible images, can offer complementary information, and integrating such information can significantly enhance the capabilities of a system to perceive and recognize its surroundings. Thus, multi-modal object detection has widespread applications...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on intelligent transportation systems 2024-11, Vol.25 (11), p.16300-16311 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Data from different modalities, such as infrared and visible images, can offer complementary information, and integrating such information can significantly enhance the capabilities of a system to perceive and recognize its surroundings. Thus, multi-modal object detection has widespread applications, particularly in challenging weather conditions like low-light scenarios. The core of multi-modal fusion lies in developing a reasonable fusion strategy, which can fully exploit the complementary features of different modalities while preventing a significant increase in model complexity. To this end, this paper proposes a novel lightweight cross-fusion module named Channel-Patch Cross Fusion (CPCF), which leverages Channel-wise Cross-Attention (CCA), Patch-wise Cross-Attention (PCA) and Adaptive Gating (AG) to encourage mutual rectification among different modalities. This process simultaneously explores commonalities across modalities while maintaining the uniqueness of each modality. Furthermore, we design a versatile intermediate fusion framework that can leverage CPCF to enhance the performance of multi-modal object detection. The proposed method is extensively evaluated on multiple public multi-modal datasets, namely FLIR, LLVIP, and DroneVehicle. The experiments indicate that our method yields consistent performance gains across various benchmarks and can be extended to different types of detectors, further demonstrating its robustness and generalizability. Our codes are available at https://github.com/Superjie13/CPCF_Multispectral . |
---|---|
ISSN: | 1524-9050 1558-0016 |
DOI: | 10.1109/TITS.2024.3412417 |