Content-Guided and Class-Oriented Learning for VHR Image Semantic Segmentation

With the flourishing of remote sensing (RS) platform techniques, very high-resolution (VHR) images have become more and more popular in recent years, which benefit the task of semantic segmentation but bring new challenges as well. Small objects, such as cars and trees, only occupy a few pixels in V...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on geoscience and remote sensing 2024, Vol.62, p.1-15
Hauptverfasser:	Liu, Fang, Liu, Keming, Liu, Jia, Yang, Jingxiang, Tang, Xu, Xiao, Liang
Format:	Artikel
Sprache:	eng
Schlagworte:	Aggregates Aggregation Class-oriented content-guided Convolution Deformation effects Embedding Feature extraction Formability Image processing Image resolution Image segmentation Information processing Learning Modules Pixels Qualitative analysis Remote sensing remote sensing (RS) Semantic segmentation Semantics Transformers Trees very high-resolution (VHR) image
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	With the flourishing of remote sensing (RS) platform techniques, very high-resolution (VHR) images have become more and more popular in recent years, which benefit the task of semantic segmentation but bring new challenges as well. Small objects, such as cars and trees, only occupy a few pixels in VHR images and are usually hard to segment. Moreover, the overlap problem about similar ground objects, such as low vegetation and trees, always results in underperformance. In this article, a content-guided and class-oriented network (CGCO-Net) for VHR image semantic segmentation is proposed to tackle this problem. Specifically, an adaptive content-guided fusion (ACGF) module with deformable convolution is introduced to capture long-distance dependencies and spatial aggregation effectively. With the guidance of the high-level features, the semantic content knowledge is gradually aggregated into low-level features and the details of the original features could be preserved. In addition, a multiscale channel alignment module is introduced into the encoder-decoder structure to further extract the long-range context information and reduce the calculation consumption. In order to improve the ability of pixel-level classification, a class-oriented representation learning (CORL) way is designed with transformer blocks by class embedding and deep supervision, which gradually enhance the discrimination and benefit the final segmentation. Furthermore, a weighted loss function and a threshold optimization strategy are employed to alleviate the sample imbalance problem. Tested on three public datasets and compared with several state-of-the-art methods, the proposed CGCO-net achieves good performance in both qualitative and quantitative analysis.
ISSN:	0196-2892 1558-0644
DOI:	10.1109/TGRS.2024.3460081