SeLo v2: Towards for Higher and Faster Semantic Localization

Semantic localization (SeLo) refers to locating the most relevant position in the Remote Sensing (RS) image based on the semantic information contained in the retrieved text, which is an emerging task based on cross-modal retrieval. SeLo achieves pixel-level semantic retrieval through caption-level...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE geoscience and remote sensing letters 2023-06, p.1-1
Hauptverfasser:	Yu, Miao, Yuan, Heqiang, Chen, Jialiang, Hao, Chongyang, Wang, Zhe, Yuan, Zhiqiang, Lu, Bin
Format:	Artikel
Sprache:	eng
Schlagworte:	chaotic self-feeding mechanism Feature extraction Location awareness Maximum likelihood estimation multilevel likelihood expansion Probability distribution Remote sensing remote sensing cross-modal retrieval Semantic localization Semantics Task analysis
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Semantic localization (SeLo) refers to locating the most relevant position in the Remote Sensing (RS) image based on the semantic information contained in the retrieved text, which is an emerging task based on cross-modal retrieval. SeLo achieves pixel-level semantic retrieval through caption-level annotations, which plays an important role in RS image-text retrieval. Although there are some researches on SeLo tasks at present, the current SeLo framework is still quite violent, which consumes a lot of resources. In this paper, based on the existing work, we conduct a more in-depth exploration and propose SeLo v2, which greatly improves the speed and accuracy of the SeLo task. First, based on the characteristics of regional consistency of remote sensing images, a multilevel likelihood expansion (MLE) is proposed to reduce the number of cross-modal similarity calculations, and the speed of SeLo task is greatly improved by combining single-scale clipping with multi-point multi-scale expansion. Next, to enhance the accuracy, we come up with the chaotic self-feeding mechanism (CSM), which performs semantic frequency enhancement to fully extract the semantic information contained in the caption, making the generated SeLo map more detailed and accurate. Compared with the initial SeLo framework, SeLo v2 has better semantic localization performance with only half the time consumed, which shows the great advantages of SeLo v2 in terms of time and accuracy. The proposed SeLo v2 is available at Link.
ISSN:	1545-598X
DOI:	10.1109/LGRS.2023.3288632