TransVFS: A spatio-temporal local–global transformer for vision-based force sensing during ultrasound-guided prostate biopsy

Robot-assisted prostate biopsy is a new technology to diagnose prostate cancer, but its safety is influenced by the inability of robots to sense the tool-tissue interaction force accurately during biopsy. Recently, vision based force sensing (VFS) provides a potential solution to this issue by utili...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Medical image analysis 2024-05, Vol.94, p.103130-103130, Article 103130
Hauptverfasser: Wang, Yibo, Ye, Zhichao, Wen, Mingwei, Liang, Huageng, Zhang, Xuming
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Robot-assisted prostate biopsy is a new technology to diagnose prostate cancer, but its safety is influenced by the inability of robots to sense the tool-tissue interaction force accurately during biopsy. Recently, vision based force sensing (VFS) provides a potential solution to this issue by utilizing image sequences to infer the interaction force. However, the existing mainstream VFS methods cannot realize the accurate force sensing due to the adoption of convolutional or recurrent neural network to learn deformation from the optical images and some of these methods are not efficient especially when the recurrent convolutional operations are involved. This paper has presented a Transformer based VFS (TransVFS) method by leveraging ultrasound volume sequences acquired during prostate biopsy. The TransVFS method uses a spatio-temporal local–global Transformer to capture the local image details and the global dependency simultaneously to learn prostate deformations for force estimation. Distinctively, our method explores both the spatial and temporal attention mechanisms for image feature learning, thereby addressing the influence of the low ultrasound image resolution and the unclear prostate boundary on the accurate force estimation. Meanwhile, the two efficient local–global attention modules are introduced to reduce 4D spatio-temporal computation burden by utilizing the factorized spatio-temporal processing strategy, thereby facilitating the fast force estimation. Experiments on prostate phantom and beagle dogs show that our method significantly outperforms existing VFS methods and other spatio-temporal Transformer models. The TransVFS method surpasses the most competitive compared method ResNet3dGRU by providing the mean absolute errors of force estimation, i.e., 70.4 ± 60.0 millinewton (mN) vs 123.7 ± 95.6 mN, on the transabdominal ultrasound dataset of dogs. •A 4D ultrasound based deep learning method for sensing force during prostate biopsy.•A spatio-temporal local–global Transformer for prostate deformation learning.•Spatio-temporal attention modules for feature extraction from ultrasound volumes.•A factorized spatio-temporal scheme for reducing computation cost of 4D Transformer.•The outperforming force sensing performance of our method on four in-house datasets.
ISSN:1361-8415
1361-8423
DOI:10.1016/j.media.2024.103130