Learning from multiple annotators for medical image segmentation

highlights•A novel deep CNN architecture is proposed for jointly learning the expert consensus label and the annotator’s label. The proposed architecture (Fig. 1) consists of two coupled CNNs where one estimates the expert consensus label probabilities and the other models the characteristics of ind...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Pattern recognition 2023-06, Vol.138, p.109400-None, Article 109400
Hauptverfasser: Zhang, Le, Tanno, Ryutaro, Xu, Moucheng, Huang, Yawen, Bronik, Kevin, Jin, Chen, Jacob, Joseph, Zheng, Yefeng, Shao, Ling, Ciccarelli, Olga, Barkhof, Frederik, Alexander, Daniel C.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:highlights•A novel deep CNN architecture is proposed for jointly learning the expert consensus label and the annotator’s label. The proposed architecture (Fig. 1) consists of two coupled CNNs where one estimates the expert consensus label probabilities and the other models the characteristics of individual annotators (e.g., tendency to over-segmentation, mix-up between different classes, etc) by estimating the pixel-wise confusion matrices (CMs) on a per image basis. Unlike STAPLE [25] and its variants, our method models, and disentangles with deep neural networks, the complex mappings from the input images to the annotator behaviours and to the expert consensus label.•The parameters of our CNNs are “global variables” that are optimised across different image samples; this enables the model to disentangle robustly the annotators’ mistakes and the expert consensus label based on correlations between similar image samples, even when the number of available annotations is small per image (e.g., a single annotation per image). In contrast, this would not be possible with STAPLE [25] and its variants [5,8] where the annotators’ parameters are estimated on every target image separately.•This paper extends the preliminary version of our method presented at the NeurIPS Thirty-fourth Annual Conference on Neural Information Processing Systems [30], by extensively evaluating our model on a new created real-world multiple sclerosis lesion dataset (QSMSC at UCL: Queen Square Multiple Sclerosis Center at UCL, UK). This dataset is generated with manual segmentations from 4 different annotators (3 radiologists with different level skills and 1 expert to generate the expert consensus label). Additionally, we presented a comprehensive discussion about our model’s potential applications (e.g., estimate annotator’s quality and annotation’s quality), the future works we are going to explore, and the potential limitations of our model. [Display omitted] Supervised machine learning methods have been widely developed for segmentation tasks in recent years. However, the quality of labels has high impact on the predictive performance of these algorithms. This issue is particularly acute in the medical image domain, where both the cost of annotation and the inter-observer variability are high. Different human experts contribute estimates of the ”actual” segmentation labels in a typical label acquisition process, influenced by their personal biases and competency levels. The performa
ISSN:0031-3203
1873-5142
0031-3203
DOI:10.1016/j.patcog.2023.109400