FCSN: Global Context Aware Segmentation by Learning the Fourier Coefficients of Objects in Medical Images
The encoder-decoder model is a commonly used Deep Neural Network (DNN) model for medical image segmentation. Conventional encoder-decoder models make pixel-wise predictions focusing heavily on local patterns around the pixel. This makes it challenging to give segmentation that preserves the object...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The encoder-decoder model is a commonly used Deep Neural Network (DNN) model
for medical image segmentation. Conventional encoder-decoder models make
pixel-wise predictions focusing heavily on local patterns around the pixel.
This makes it challenging to give segmentation that preserves the object's
shape and topology, which often requires an understanding of the global context
of the object. In this work, we propose a Fourier Coefficient Segmentation
Network~(FCSN) -- a novel DNN-based model that segments an object by learning
the complex Fourier coefficients of the object's masks. The Fourier
coefficients are calculated by integrating over the whole contour. Therefore,
for our model to make a precise estimation of the coefficients, the model is
motivated to incorporate the global context of the object, leading to a more
accurate segmentation of the object's shape. This global context awareness also
makes our model robust to unseen local perturbations during inference, such as
additive noise or motion blur that are prevalent in medical images. When FCSN
is compared with other state-of-the-art models (UNet+, DeepLabV3+, UNETR) on 3
medical image segmentation tasks (ISIC\_2018, RIM\_CUP, RIM\_DISC), FCSN
attains significantly lower Hausdorff scores of 19.14 (6\%), 17.42 (6\%), and
9.16 (14\%) on the 3 tasks, respectively. Moreover, FCSN is lightweight by
discarding the decoder module, which incurs significant computational overhead.
FCSN only requires 22.2M parameters, 82M and 10M fewer parameters than UNETR
and DeepLabV3+. FCSN attains inference and training speeds of 1.6ms/img and
6.3ms/img, that is 8$\times$ and 3$\times$ faster than UNet and UNETR. |
---|---|
DOI: | 10.48550/arxiv.2207.14477 |