An attention-guided multi-scale fusion network for surgical instrument segmentation

In contemporary surgical practice, minimally invasive surgery has significantly alleviated the physiological and psychological strain on patients while dramatically curtailing their recovery periods. Within the realm of robot-assisted minimally invasive surgery, the precise segmentation of surgical...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Biomedical signal processing and control 2025-04, Vol.102, p.107296, Article 107296
Hauptverfasser: Song, Mengqiu, Zhai, Chenxu, Yang, Lei, Liu, Yanhong, Bian, Guibin
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In contemporary surgical practice, minimally invasive surgery has significantly alleviated the physiological and psychological strain on patients while dramatically curtailing their recovery periods. Within the realm of robot-assisted minimally invasive surgery, the precise segmentation of surgical instruments assumes paramount importance, as it not only enhances the precision with which surgeons execute surgical maneuvers but also fortifies the overall perioperative safety of patients. Despite these benefits, the accurate segmentation of surgical instruments remains beset by a multitude of challenges, emanating primarily from the intricacy of the surgical milieu, specular reflection, diverse instruments, etc. To efficaciously confront these challenges, this paper introduces a novel attention-guided multi-scale fusion network. Specifically, to facilitate effective feature representation, an effective backbone network leveraging Octave convolution is constructed to mitigate feature redundancy. Simultaneously, the encoding path incorporates the Transformer module into bottleneck layer to infuse global contextual information, thereby synergistically capturing both global and local feature information. Moreover, a dual attention fusion block and a context feature fusion block are ingeniously integrated into the skip connections to refine local features, to meticulously discern edge details and effectively suppress the interference of useless information. Lastly, this paper presents an adaptive multi-Scale feature weighting block, which adeptly fuses multi-scale features from disparate layers within the decoding path. To rigorously substantiate the performance of proposed model, comprehensive experimentation is conducted on two widely recognized benchmark datasets. The results reach a Dice score of 96.34% and a mIOU value of 96.14% on kvasir-instrument dataset. Meanwhile, it also reaches a Dice score of 97.31% and a mIOU value of 96.15% on Endovis2017 dataset. Experiments show that it attests to the substantial superiority of proposed network in terms of accuracy and robustness against with advanced segmentation models. Therefore, proposed model could offer a promising solution to enhance the precision and safety of robot-assisted minimally invasive surgeries. •To obtain strong feature representation, an effective backbone network is devised.•Local feature enhancement is realized via dual attention fusion block and context feature fusion block.•An adaptive mult
ISSN:1746-8094
DOI:10.1016/j.bspc.2024.107296