TransUNet: Rethinking the U-Net architecture design for medical image segmentation through the lens of transformers

Medical image segmentation is crucial for healthcare, yet convolution-based methods like U-Net face limitations in modeling long-range dependencies. To address this, Transformers designed for sequence-to-sequence predictions have been integrated into medical image segmentation. However, a comprehens...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Medical image analysis 2024-10, Vol.97, p.103280, Article 103280
Hauptverfasser:	Chen, Jieneng, Mei, Jieru, Li, Xianhang, Lu, Yongyi, Yu, Qihang, Wei, Qingyue, Luo, Xiangde, Xie, Yutong, Adeli, Ehsan, Wang, Yan, Lungren, Matthew P., Zhang, Shaoting, Xing, Lei, Lu, Le, Yuille, Alan, Zhou, Yuyin
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Humans Image Processing, Computer-Assisted - methods Medical image segmentation Neural Networks, Computer U-Net Vision Transformers
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Medical image segmentation is crucial for healthcare, yet convolution-based methods like U-Net face limitations in modeling long-range dependencies. To address this, Transformers designed for sequence-to-sequence predictions have been integrated into medical image segmentation. However, a comprehensive understanding of Transformers’ self-attention in U-Net components is lacking. TransUNet, first introduced in 2021, is widely recognized as one of the first models to integrate Transformer into medical image analysis. In this study, we present the versatile framework of TransUNet that encapsulates Transformers’ self-attention into two key modules: (1) a Transformer encoder tokenizing image patches from a convolution neural network (CNN) feature map, facilitating global context extraction, and (2) a Transformer decoder refining candidate regions through cross-attention between proposals and U-Net features. These modules can be flexibly inserted into the U-Net backbone, resulting in three configurations: Encoder-only, Decoder-only, and Encoder+Decoder. TransUNet provides a library encompassing both 2D and 3D implementations, enabling users to easily tailor the chosen architecture. Our findings highlight the encoder’s efficacy in modeling interactions among multiple abdominal organs and the decoder’s strength in handling small targets like tumors. It excels in diverse medical applications, such as multi-organ segmentation, pancreatic tumor segmentation, and hepatic vessel segmentation. Notably, our TransUNet achieves a significant average Dice improvement of 1.06% and 4.30% for multi-organ segmentation and pancreatic tumor segmentation, respectively, when compared to the highly competitive nn-UNet, and surpasses the top-1 solution in the BrasTS2021 challenge. 2D/3D Code and models are available at https://github.com/Beckschen/TransUNet and https://github.com/Beckschen/TransUNet-3D, respectively. •Incorporating self- and cross-attention into U-Net for medical image segmentation•Transformer decoder with coarse-to-fine attention to enhance small tumor segmentation•TransUNet enhances U-Net’s encoding/decoding, surpassing nnUNet on multiple tasks.•Our codebase supports 2D and 3D implementations to foster future exploration.
ISSN:	1361-8415 1361-8423 1361-8423
DOI:	10.1016/j.media.2024.103280