TMF-Net: A Transformer-based Multiscale Fusion Network for Surgical Instrument Segmentation from Endoscopic Images

Automatic surgical instrument segmentation is a necessary step for the steady operation of surgical robots, and the segmentation accuracy directly affects the surgical effect. Nevertheless, accurate surgical instrument segmentation from endoscopic images remains a challenging task due to the complex...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on instrumentation and measurement 2023-01, Vol.72, p.1-1
Hauptverfasser: Yang, Lei, Gu, Yuge, Bian, Guibin, Liu, Yanhong
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Automatic surgical instrument segmentation is a necessary step for the steady operation of surgical robots, and the segmentation accuracy directly affects the surgical effect. Nevertheless, accurate surgical instrument segmentation from endoscopic images remains a challenging task due to the complex environment and instrument motion during surgery. Based on the encoder-decoder structure, a transformer-based multiscale fusion network, named TMF-Net, is proposed to address the difficulties in the area of surgical instrument segmentation. To realize effective feature representation based on the pretrained ResNet34 and transformer, and to strengthen both advantages of different encoder units, a dual-encoder unit is proposed to simultaneously learn the semantic relationship between adjacent pixels and distant pixels and comprehensively capture the global context information. Meanwhile, to retain more contextual information, a trapezoid atrous spatial pyramid pooling (trapezoid ASPP) block is proposed for feature enhancement of local features with different receptive fields to enrich feature information. Furthermore, in addition to multiscale surgical instruments in endoscopic images, a multiscale attention fusion (MAF) block is proposed to fuse multiscale feature maps to make the segmentation network direct more attention to the efficient channels so that it can improve the segmentation accuracy. Two typical datasets are used for performance analysis and verification, including Kvasir-Instrument and Endovis2017. Experimental results indicate that the proposed TMF-Net could effectively improve segmentation accuracy on surgical instruments, and it could also yield a competitive segmentation result in comparison with advanced detection methods.
ISSN:0018-9456
1557-9662
DOI:10.1109/TIM.2022.3225922