Global contextually guided lightweight network for RGB-thermal urban scene understanding

Recent achievements in scene understanding have benefited considerably from the rapid development of convolutional neural networks. However, improvements of scene understanding methods have been restricted in terms of practical deployment, especially in mobile devices, owing to their high computatio...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Engineering applications of artificial intelligence 2023-01, Vol.117, p.105510, Article 105510
Hauptverfasser:	Gong, Tingting, Zhou, Wujie, Qian, Xiaohong, Lei, Jingsheng, Yu, Lu
Format:	Artikel
Sprache:	eng
Schlagworte:	Cross-modal integration Hybrid feature-cascaded aggregation module Lightweight network RGB-T Scene understanding
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Recent achievements in scene understanding have benefited considerably from the rapid development of convolutional neural networks. However, improvements of scene understanding methods have been restricted in terms of practical deployment, especially in mobile devices, owing to their high computational costs and memory consumption. Existing networks can integrate RGB and thermal (RGB-T) cues for sample fusion, resulting in insufficient exploitation of the complicated correlations between the two image modalities. Moreover, some of these methods do not consider the influence of global features on the interactions between low- and high-level features. Hence, in this study, we introduce a novel network named the global contextually guided lightweight network (GCGLNet), which has fewer parameters and higher speed, ensuring accuracy. Specifically, secondary cross-modal integration is introduced to remove redundant information while fusing and propagating effective modal information. A hybrid feature-cascaded aggregation module is also introduced to emphasize the global context along with complementation and calibration between the high- and low-level features. Extensive experiments were conducted on two benchmark RGB-T datasets to demonstrate that the proposed GCGLNet yields an accuracy comparable with those of state-of-the-art approaches when operated at 51.89 FPS for 480 × 640 pixel inputs with only 7.87 M parameters. Thus, GCGLNet is expected to open new avenues for research on urban scene understanding via RGB-T sensors.
ISSN:	0952-1976 1873-6769
DOI:	10.1016/j.engappai.2022.105510