Computer-aided segmentation on MRI for prostate radiotherapy, part II: Comparing human and computer observer populations and the influence of annotator variability on algorithm variability
•Different loss functions for developing deep learning (DL) algorithms can change prostate and organs at risk (OAR) boundaries, particularly in anatomical regions with high interobserver variability.•DL-based automatic segmentation algorithms exhibit high variability in similar anatomical regions as...
Gespeichert in:
Veröffentlicht in: | Radiotherapy and oncology 2022-04, Vol.169, p.132-139 |
---|---|
Hauptverfasser: | , , , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | •Different loss functions for developing deep learning (DL) algorithms can change prostate and organs at risk (OAR) boundaries, particularly in anatomical regions with high interobserver variability.•DL-based automatic segmentation algorithms exhibit high variability in similar anatomical regions as the humans who annotated the images for the DL algorithm development.•Spatial entropy maps provide an intuitive characterization of voxel-wise segmentation variability.•DL-based automatic segmentation algorithms can be more consistent than human observers in delineating the prostate and OARs on MRIs for prostate radiotherapy.•Segmentation performance of T2-weighted planning MRIs was comparable to that of T2/T1-weighted postimplant MRIs.
Comparing deep learning (DL) algorithms to human interobserver variability, one of the largest sources of noise in human-performed annotations, is necessary to inform the clinical application, use, and quality assurance of DL for prostate radiotherapy.
One hundred fourteen DL algorithms were developed on 295 prostate MRIs to segment the prostate, external urinary sphincter (EUS), seminal vesicles (SV), rectum, and bladder. Fifty prostate MRIs of 25 patients undergoing MRI-based low-dose-rate prostate brachytherapy were acquired as an independent test set. Groups of DL algorithms were created based on the loss functions used to train them, and the spatial entropy (SE) of their predictions on the 50 test MRIs was computed. Five human observers contoured the 50 test MRIs, and SE maps of their contours were compared with those of the groups of the DL algorithms. Additionally, similarity metrics were computed between DL algorithm predictions and consensus annotations of the 5 human observers’ contours of the 50 test MRIs.
A DL algorithm yielded statistically significantly higher similarity metrics for the prostate than did the human observers (H) (prostate Matthew’s correlation coefficient, DL vs. H: planning–0.931 vs. 0.903, p |
---|---|
ISSN: | 0167-8140 1879-0887 |
DOI: | 10.1016/j.radonc.2021.12.033 |