Weakly-Supervised Semantic Segmentation in Aerial Imagery via explicit Pixel-Level Constraints

In recent years, image-level weakly supervised semantic segmentation (WSSS) has developed rapidly in natural scenes due to the easy availability of classification tags. However, limited to complex backgrounds, multi-category scenes, and dense small targets in remote sensing (RS) images, relatively l...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on geoscience and remote sensing 2022, Vol.60, p.1-1
Hauptverfasser:	Zhou, Ruixue, Zhang, Wenkai, Yuan, Zhiqiang, Rong, Xuee, Liu, Wenjie, Fu, Kun, Sun, Xian
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Aerial photography Artificial neural networks Classification dynamic multi-scale fusion mechanism explicit pixel-level constraints Image processing Image segmentation image-level WSSS Imagery Modules pixel global awareness pixel soft classification Pixels Remote sensing self-supervised Semantic segmentation Semantics Spatial data
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In recent years, image-level weakly supervised semantic segmentation (WSSS) has developed rapidly in natural scenes due to the easy availability of classification tags. However, limited to complex backgrounds, multi-category scenes, and dense small targets in remote sensing (RS) images, relatively little research has been conducted in this field. To alleviate the impact of the above problems in RS scenes, a self-supervised Siamese network based on an explicit pixel-level constraints framework is proposed, which greatly improves the quality of class activation maps and the positioning accuracy in multi-category RS scenes. Specifically, there are three novel devices in this paper to promote performance to a new level: (a) A pixel-soft classification loss is proposed, which realizes explicit constraints on pixels during the image-level training; (b) A pixel global awareness module, which captures high-level semantic context and low-level pixel spatial information, is constructed to improve the consistency and accuracy of RS object segmentation; (c) A dynamic multi-scale fusion module with a gating mechanism is devised, which enhances feature representation and improves the positioning accuracy of RS objects, particularly on small and dense objects. Experiments on two RS challenge datasets demonstrate that these proposed modules achieve new state-of-the-art results by only using image-level labels, which improve mIoU to 36.79% on iSAID and 45.43% on ISPRS in the WSSS task. To the best of our knowledge, this is the first work to perform image-level WSSS on multi-class RS scenes.
ISSN:	0196-2892 1558-0644
DOI:	10.1109/TGRS.2022.3224477