The NCI Imaging Data Commons as a platform for reproducible research in computational pathology
•The Imaging Data Commons (IDC) is a new repository of FAIR cancer image collections.•Introduction to using the IDC for reproducible research in computational pathology.•The IDC and cloud-based machine learning services facilitate reproducibility in complementary ways.•Evaluation results indicate a...
Gespeichert in:
Veröffentlicht in: | Computer methods and programs in biomedicine 2023-12, Vol.242, p.107839-107839, Article 107839 |
---|---|
Hauptverfasser: | , , , , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | •The Imaging Data Commons (IDC) is a new repository of FAIR cancer image collections.•Introduction to using the IDC for reproducible research in computational pathology.•The IDC and cloud-based machine learning services facilitate reproducibility in complementary ways.•Evaluation results indicate a practical reproducibility limit.•Categorization of key reproducibility challenges of computational pathology studies.
Reproducibility is a major challenge in developing machine learning (ML)-based solutions in computational pathology (CompPath). The NCI Imaging Data Commons (IDC) provides >120 cancer image collections according to the FAIR principles and is designed to be used with cloud ML services. Here, we explore its potential to facilitate reproducibility in CompPath research.
Using the IDC, we implemented two experiments in which a representative ML-based method for classifying lung tumor tissue was trained and/or evaluated on different datasets. To assess reproducibility, the experiments were run multiple times with separate but identically configured instances of common ML services.
The results of different runs of the same experiment were reproducible to a large extent. However, we observed occasional, small variations in AUC values, indicating a practical limit to reproducibility.
We conclude that the IDC facilitates approaching the reproducibility limit of CompPath research (i) by enabling researchers to reuse exactly the same datasets and (ii) by integrating with cloud ML services so that experiments can be run in identically configured computing environments. |
---|---|
ISSN: | 0169-2607 1872-7565 1872-7565 |
DOI: | 10.1016/j.cmpb.2023.107839 |