SCTrans: Self-align and cross-align transformer for few-shot segmentation
Few-shot Semantic Segmentation (FSS) refers to train a segmentation model that can be generalized to novel categories with limited labeled images. One challenge of FSS is spatial inconsistency between support and query images, e.g., appearance and texture. Most existing methods are only committed to...
Gespeichert in:
Veröffentlicht in: | Image and vision computing 2024-02, Vol.142, p.104893, Article 104893 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Few-shot Semantic Segmentation (FSS) refers to train a segmentation model that can be generalized to novel categories with limited labeled images. One challenge of FSS is spatial inconsistency between support and query images, e.g., appearance and texture. Most existing methods are only committed to utilizing the semantic-level prototypes of support images to guide mask predictions. These methods, nevertheless, only focus on the most discriminate regions of the object rather than holonomic feature representations. Besides, another question exists that the lack of interaction between paired support and query images. In this paper, we propose a self-align and cross-align transformer (SCTrans) to remedy the above limitations. Specifically, we design a feature fusion module (FFM) to incorporate low-level information from the query branch into mid-level semantic features, boosting the semantic representations of query images. In addition, a feature alignment module is designed to bidirectionally propagate semantic information from support to query images conditioned on more representative support and query features, increasing both intra-class similarities and inter-class differences. Extensive experiments on PASCAL-5i and COCO-20i show that our SCTrans significantly advances the state-of-the-art methods.
•We propose a novel self-align and cross-align transformer for few-shot segmentation.•Our method can effectively boost feature representations and mine similar parts between support and query pairs.•Our transformer based method can advance the SOTA methods. |
---|---|
ISSN: | 0262-8856 1872-8138 |
DOI: | 10.1016/j.imavis.2023.104893 |