Semi-supervised segmentation of echocardiography videos via noise-resilient spatiotemporal semantic calibration and fusion

•A novel semi-supervised model for left ventricle endocardium segmentation from echocardiography videos.•A temporal context-aware feature extraction module to learn relatively distinguished feature representations for the same frame.•An adaptive spatiotemporal semantic calibration method to adaptive...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Medical image analysis 2022-05, Vol.78, p.102397-102397, Article 102397
Hauptverfasser: Wu, Huisi, Liu, Jiasheng, Xiao, Fangyan, Wen, Zhenkun, Cheng, Lan, Qin, Jing
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:•A novel semi-supervised model for left ventricle endocardium segmentation from echocardiography videos.•A temporal context-aware feature extraction module to learn relatively distinguished feature representations for the same frame.•An adaptive spatiotemporal semantic calibration method to adaptively align features of consecutive frames.•A bi-directional spatiotemporal semantics fusion module to make full use of the spatiotemporal coherence. [Display omitted] We present a novel model for left ventricle endocardium segmentation from echocardiography video, which is of great significance in clinical practice and yet a challenging task due to (1) the severe speckle noise in echocardiography videos, (2) the irregular motion of pathological heart, and (3) the limited training data caused by high annotation cost. The proposed model has three compelling characteristics. First, we propose a novel adaptive spatiotemporal semantic calibration method to align the feature maps of consecutive frames, where the spatiotemporal correspondences are figured out based on feature maps instead of pixels, thereby mitigating the adverse effects of speckle noise in the calibration. Second, we further learn the importance of each feature map of neighbouring frames to the current frame from the temporal perspective so as to distinctively rather than uniformly harness the temporal information to tackle the irregular and anisotropic motions. Third, we integrate these techniques into the mean teacher semi-supervised architecture to leverage a large amount of unlabeled data to improve the segmentation accuracy. We extensively evaluate the proposed method on two public echocardiography video datasets (EchoNet-Dynamic and CAMUS), where the average dice coefficient on the left ventricular endocardium segmentation achieves 92.87% and 93.79%, respectively. Comparisons with state-of-the-art methods also demonstrate the effectiveness of the proposed method by achieving a better segmentation performance with a faster speed.
ISSN:1361-8415
1361-8423
DOI:10.1016/j.media.2022.102397