Beyond Pixel-level Annotation: Exploring Self-supervised Learning for Change Detection with Image-Level Supervision

Change detection (CD) in high-resolution remote sensing has received large attention due to its wide range of applications. Many methods have been proposed in the literature and achieved excellent performance. However, they are often fully supervised, thus requiring abundant pixel-level labeled samp...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on geoscience and remote sensing 2024-01, Vol.62, p.1-1
Hauptverfasser:	Zhao, Maofan, Hu, Xinli, Zhang, Linlin, Meng, Qingyan, Chen, Yuxing, Bruzzone, Lorenzo
Format:	Artikel
Sprache:	eng
Schlagworte:	Annotations Artificial neural networks Cams Change detection contrastive learning Detection equivariant regularization Feature extraction Labels Labour Learning mutual learning Pixels Prototypes Regularization Remote sensing Self-supervised learning Supervised learning Task analysis Training weakly supervised
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Change detection (CD) in high-resolution remote sensing has received large attention due to its wide range of applications. Many methods have been proposed in the literature and achieved excellent performance. However, they are often fully supervised, thus requiring abundant pixel-level labeled samples, which is time-consuming and labor-intensive. Especially compared to the common single-temporal interpretation, labeling bi-temporal images is often more complicated. Therefore, this study combines weakly supervised learning (WSL) to reduce label acquisition costs. But changed regions are small, fragmented, and similar to the background, which increase the gap between weakly supervised and fully supervised tasks. To address these difficulties, we explore self-supervised methods to construct a WSL framework based on image-level labels for general CD, termed WSLCD in this paper. First, we design a double-branch siamese network to derive embeddings and initial class attention maps (CAMs), which inputs the original image pair and the spatially transformed image pair. Second, mutual learning and equivariant regularization (MLER) is enforced on CAMs from different views, which implements consistency constraints in confusion regions and makes CAMs learn from each other based on saliency regions. Furthermore, prototype-based contrastive learning (PCL) is designed such that unreliable pixels can learn from prototypes computed from reliable pixels. PCL includes intra-view contrast and cross-view contrast depending on whether the prototypes and class embeddings are from the same view. With the above strategies, we narrow the gap between image-level weakly supervised CD and fully supervised CD. Experiments are conducted on three CD datasets, including CLCD, DSIFN and GCD. Our method achieves state-of-the-art performance on pseudo label generation and CD. The code is available at https://github.com/mfzhao1998/WSLCD.
ISSN:	0196-2892 1558-0644
DOI:	10.1109/TGRS.2024.3379431