Image based prognosis in head and neck cancer using convolutional neural networks: a case study in reproducibility and optimization

In the past decade, there has been a sharp increase in publications describing applications of convolutional neural networks (CNNs) in medical image analysis. However, recent reviews have warned of the lack of reproducibility of most such studies, which has impeded closer examination of the models a...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Scientific reports 2023-10, Vol.13 (1), p.18176-18176, Article 18176
Hauptverfasser:	Mateus, Pedro, Volmer, Leroy, Wee, Leonard, Aerts, Hugo J. W. L., Hoebers, Frank, Dekker, Andre, Bermejo, Inigo
Format:	Artikel
Sprache:	eng
Schlagworte:	639/705/117 692/308 692/308/53 692/308/53/2422 692/699/67 692/699/67/2321 Deep learning Head & neck cancer Humanities and Social Sciences Image processing multidisciplinary Neural networks Patients Reproducibility Science Science (multidisciplinary)
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In the past decade, there has been a sharp increase in publications describing applications of convolutional neural networks (CNNs) in medical image analysis. However, recent reviews have warned of the lack of reproducibility of most such studies, which has impeded closer examination of the models and, in turn, their implementation in healthcare. On the other hand, the performance of these models is highly dependent on decisions on architecture and image pre-processing. In this work, we assess the reproducibility of three studies that use CNNs for head and neck cancer outcome prediction by attempting to reproduce the published results. In addition, we propose a new network structure and assess the impact of image pre-processing and model selection criteria on performance. We used two publicly available datasets: one with 298 patients for training and validation and another with 137 patients from a different institute for testing. All three studies failed to report elements required to reproduce their results thoroughly, mainly the image pre-processing steps and the random seed. Our model either outperforms or achieves similar performance to the existing models with considerably fewer parameters. We also observed that the pre-processing efforts significantly impact the model’s performance and that some model selection criteria may lead to suboptimal models. Although there have been improvements in the reproducibility of deep learning models, our work suggests that wider implementation of reporting standards is required to avoid a reproducibility crisis.
ISSN:	2045-2322 2045-2322
DOI:	10.1038/s41598-023-45486-5