Automatic glottis segmentation for laryngeal endoscopic images based on U-Net

The glottis’s morphology not only reflects vocal and respiratory information, but also plays an important role in the diagnosis of laryngeal diseases. The glottis segmentation is a primary step in computer-aided diagnostic system, however is challenging due to various shapes of glottis, low contrast...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Biomedical signal processing and control 2022-01, Vol.71, p.103116, Article 103116
Hauptverfasser: Ding, Huijun, Cen, Qian, Si, Xiaoyu, Pan, Zhanpeng, Chen, Xiangdong
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The glottis’s morphology not only reflects vocal and respiratory information, but also plays an important role in the diagnosis of laryngeal diseases. The glottis segmentation is a primary step in computer-aided diagnostic system, however is challenging due to various shapes of glottis, low contrast with surrounding tissues, the existence of laryngeal diseases and so on. In this paper, a deep attention network based on U-Net with color normalization operation (CN-DA-Unet) is proposed to achieve an end-to-end segmentation of the glottal area for the first time. The original images are first processed by color normalization to reduce the adverse effects of low contrast and large differences in colors between different images. The normalized images are then sent to the proposed DA-Unet for feature extraction. In this network, residual structure is incorporated to extract rich features from deep neural networks. After extracting features, a feature pyramid attention (FPA) module is applied to enhance the semantic information of the glottal area. These features are up-sampled and added to the features from the corresponding encoding layer for several times to obtain the final segmented image. The proposed approach is tested on laryngeal images of an in–house dataset including images from healthy subjects and pathologic subjects. Its performance is evaluated by several reliable and popular evaluation metrics, achieving the dice coefficient of 92.9%, sensitivity of 93.5% and precision of 92.6%. These results demonstrate the effectiveness of our proposed approach and the better performance comparing with several popular networks.
ISSN:1746-8094
1746-8108
DOI:10.1016/j.bspc.2021.103116