Comparing deep learning-based automatic segmentation of breast masses to expert interobserver variability in ultrasound imaging
Deep learning is a powerful tool that became practical in 2008, harnessing the power of Graphic Processing Unites, and has developed rapidly in image, video, and natural language processing. There are ongoing developments in the application of deep learning to medical data for a variety of tasks acr...
Gespeichert in:
Veröffentlicht in: | Computers in biology and medicine 2021-12, Vol.139, p.104966-104966, Article 104966 |
---|---|
Hauptverfasser: | , , , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Deep learning is a powerful tool that became practical in 2008, harnessing the power of Graphic Processing Unites, and has developed rapidly in image, video, and natural language processing. There are ongoing developments in the application of deep learning to medical data for a variety of tasks across multiple imaging modalities. The reliability and repeatability of deep learning techniques are of utmost importance if deep learning can be considered a tool for assisting experts, including physicians, radiologists, and sonographers. Owing to the high costs of labeling data, deep learning models are often evaluated against one expert, and it is unknown if any errors fall within a clinically acceptable range. Ultrasound is a commonly used imaging modality for breast cancer screening processes and for visually estimating risk using the Breast Imaging Reporting and Data System score. This process is highly dependent on the skills and experience of the sonographers and radiologists, thereby leading to interobserver variability and interpretation. For these reasons, we propose an interobserver reliability study comparing the performance of a current top-performing deep learning segmentation model against three experts who manually segmented suspicious breast lesions in clinical ultrasound (US) images. We pretrained the model using a US thyroid segmentation dataset with 455 patients and 50,993 images, and trained the model using a US breast segmentation dataset with 733 patients and 29,884 images. We found a mean Fleiss kappa value of 0.78 for the performance of three experts in breast mass segmentation compared to a mean Fleiss kappa value of 0.79 for the performance of experts and the optimized deep learning model.
[Display omitted]
•Automatic breast mass segmentation statistical performance was on par with that of human expert.•Proposed Model had a mean Dice coefficient comparable to that of human expert.•Proposed Model had a mean Hausdorff distance comparable to that of human expert.•Proposed a focal Matthew's correlation coefficient for loss function. |
---|---|
ISSN: | 0010-4825 1879-0534 1879-0534 |
DOI: | 10.1016/j.compbiomed.2021.104966 |