QT-UNet: A Self-Supervised Self-Querying All-Transformer U-Net for 3D Segmentation

With reliable performance, and linear time complexity, Vision Transformers like the Swin Transformer are gaining popularity in the field of Medical Image Computing (MIC). Examples of effective volumetric segmentation models for brain tumours include VT-UNet, which combines conventional UNets with Sw...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE access 2024, Vol.12, p.62664-62676
Hauptverfasser: Hammer Haversen, Andreas, Bavirisetti, Durga Prasad, Hanssen Kiss, Gabriel, Lindseth, Frank
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:With reliable performance, and linear time complexity, Vision Transformers like the Swin Transformer are gaining popularity in the field of Medical Image Computing (MIC). Examples of effective volumetric segmentation models for brain tumours include VT-UNet, which combines conventional UNets with Swin Transformers using a unique encoder-decoder Cross-Attention (CA) paradigm. Self-Supervised Learning (SSL) has also experienced an increase in adoption in computer vision domains such as MIC, in situations where labelled training data is scarce. The Querying Transformer UNet (QT-UNet) model we introduce in this paper brings these advancements together. It is an all-Swin Transformer UNet with an encoder-decoder CA mechanism strengthened by SSL. For the purpose of evaluating the potential of QT-UNet as a generic volumetric segmentation model, it is subjected to extensive testing on several MIC datasets. Our best model achieves a Dice score of 88.61 on average and a Hausdorff Distance of 4.85mm making it competitive with State of the Art in Brain Tumour Segmentation (BraTS) 2021, using 40% fewer FLOPs than the baseline VT-UNet. We found poor results with Beyond The Cranial Vault (BTCV) and Medical Segmentation Decathlon (MSD), but validate the effectiveness of our new CA mechanism and find that the SSL pipeline is most effective when pre-trained with our CT-SSL dataset. The code be can found at https://github.com/AndreasHaaversen/QT-UNet .
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2024.3395058