YOLO-MS: Multispectral Object Detection via Feature Interaction and Self-Attention Guided Fusion

Object detection is essential for an autonomous driving sensing system. Since the light condition is changed in unconstrained scenarios, the detection accuracy based on visible images can be greatly degraded. Although the detection accuracy can be improved by fusing visible and infrared images, exis...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on cognitive and developmental systems 2023-12, Vol.15 (4), p.2132-2143
Hauptverfasser:	Xie, Yumin, Zhang, Langwen, Yu, Xiaoyuan, Xie, Wei
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Computer networks Data exchange Datasets Feature extraction Feature interaction Fuses Image reconstruction Infrared imagery Modules multispectral features fusion Object detection Object recognition Remote sensing Robustness self-attention Task analysis
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Object detection is essential for an autonomous driving sensing system. Since the light condition is changed in unconstrained scenarios, the detection accuracy based on visible images can be greatly degraded. Although the detection accuracy can be improved by fusing visible and infrared images, existing multispectral object detection (MOD) algorithms suffer from inadequate intermodal interaction and a lack of global dependence in the fusion approach. Thus, we propose an MOD framework called YOLO-MS by designing a feature interaction and self-attention fusion network (FISAFN) as the backbone network. Within the FISAFN, correlations between two modalities are extracted by the feature interaction module (FIM) for reconstructing the information components of each modality and enhancing capability of information exchange. To filter redundant features and enhance complementary features, long-range information dependence between two modalities are established by using a self-attention feature fusion module (SAFFM). Thus, a better information richness of the fused features can be achieved. Experimental results on the FLIR-aligned data set and the M3FD data set demonstrate that the proposed YOLO-MS performs favorably against state-of-the-art approaches, including feature-level fusion and pixel-level fusion. And further, the proposed YOLO-MS possesses good detection performance under diverse scene conditions.
ISSN:	2379-8920 2379-8939
DOI:	10.1109/TCDS.2023.3238181