Understanding Segment Anything Model: SAM is Biased Towards Texture Rather than Shape
In contrast to the human vision that mainly depends on the shape for recognizing the objects, deep image recognition models are widely known to be biased toward texture. Recently, Meta research team has released the first foundation model for image segmentation, termed segment anything model (SAM),...
Gespeichert in:
Hauptverfasser: | , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In contrast to the human vision that mainly depends on the shape for
recognizing the objects, deep image recognition models are widely known to be
biased toward texture. Recently, Meta research team has released the first
foundation model for image segmentation, termed segment anything model (SAM),
which has attracted significant attention. In this work, we understand SAM from
the perspective of texture \textit{v.s.} shape. Different from label-oriented
recognition tasks, the SAM is trained to predict a mask for covering the object
shape based on a promt. With this said, it seems self-evident that the SAM is
biased towards shape. In this work, however, we reveal an interesting finding:
the SAM is strongly biased towards texture-like dense features rather than
shape. This intriguing finding is supported by a novel setup where we
disentangle texture and shape cues and design texture-shape cue conflict for
mask prediction. |
---|---|
DOI: | 10.48550/arxiv.2311.11465 |