Semantic-Injected Bidirectional Multiscale Flow Estimation Network for Infrared and Visible Image Registration

Infrared and visible image registration ensures consistency in spatial positions across different modalities. Crossmodal images contain different scales objects and cluttered backgrounds. However, most existing image registration methods adopt the same alignment strategy for different objects, which...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE journal of selected topics in applied earth observations and remote sensing 2025-01, Vol.18, p.1-10
Hauptverfasser: Tian, Chunna, Xu, Liuwei, Li, Xiangyang, Zhou, Heng, Song, Xiqun
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Infrared and visible image registration ensures consistency in spatial positions across different modalities. Crossmodal images contain different scales objects and cluttered backgrounds. However, most existing image registration methods adopt the same alignment strategy for different objects, which leads to insufficient multi-scale feature representation and inaccurate registration of foreground objects. To address these issues, we propose a semantic-injected bidirectional multiscale flow estimation (SI-BMFE) network for infrared and visible image registration. SI-BMFE leverages feature complementarity across different scales and employs a pre-trained segmentation network to extract the spatial positions of foreground objects to improve registration accuracy. Specifically, we first design a bidirectional multiscale feature enhancement (BMFE) module to integrate feature complementarity across different scales, effectively extracts both global structures and local details. BMFE pushes the network to roughly align infrared and visible images. Then, the semantic-injected flow estimation (SFE) module is introduced to estimate multi-level deformation fields for finegrained registration of different objects. SFE utilizes a pretrained segmentation network to obtain spatial location information of foreground objects. Object location cues help the network distinguish and focus on different foreground objects from the background. SFE exploits semantic knowledge to promote fine alignment of different foreground objects and improve the accuracy of cross-modal image registration. Extensive experiments demonstrate that our proposed method outperforms state-of-theart registration networks on both the MSRS and RoadScene infrared and visible image registration datasets.
ISSN:1939-1404
2151-1535
DOI:10.1109/JSTARS.2025.3527175