SEMPNet: enhancing few-shot remote sensing image semantic segmentation through the integration of the segment anything model
Few-shot semantic segmentation has attracted increasing attention due to its potential for low dependence on annotated samples. While extensively explored in the computer vision community, these techniques are primarily designed for natural images, resulting in limited generalization to remote sensi...
Gespeichert in:
Veröffentlicht in: | GIScience and remote sensing 2024-12, Vol.61 (1) |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Few-shot semantic segmentation has attracted increasing attention due to its potential for low dependence on annotated samples. While extensively explored in the computer vision community, these techniques are primarily designed for natural images, resulting in limited generalization to remote sensing images. In contrast to the mostly individual and distinct objects presented in natural images, remote sensing images often feature clustered and regular patterns of objects. To bridge this gap, we propose a novel approach for few-shot remote sensing image semantic segmentation, which takes into account the specific characteristics of remote sensing imagery. Our approach introduces a mask classification pipeline, which initially extracts all independent objects within an image and subsequently assigns specific categories to each object guided by semantic information derived from few support images. To accomplish this, a robust mask extractor is imperative. Fortunately, the impressive segment anything model (SAM) possesses the potential to fulfill this role. Leveraging its remarkable zero-shot segmentation capabilities, we present the SAM-enhanced mask parsing network (SEMPNet), a novel few-shot remote sensing image semantic segmentation model. The method generates a set of masks for each image using SAM, transforming the segmentation task into a mask classification problem. To precisely classify each mask, we calculate pixel-wise correlations between each mask and the support features through cross-image position attention. Finally, a mask parsing module is utilized to decode the correlation maps and generate the segmentation results. The experiments on two remote sensing datasets testify the superiority of our method. Our code will be available at
https://github.com/TinyAway/SEMPNet
. |
---|---|
ISSN: | 1548-1603 1943-7226 |
DOI: | 10.1080/15481603.2024.2426589 |