FAIR‐compliant clinical, radiomics and DICOM metadata of RIDER, interobserver, Lung1 and head‐Neck1 TCIA collections

Purpose One of the most frequently cited radiomics investigations showed that features automatically extracted from routine clinical images could be used in prognostic modeling. These images have been made publicly accessible via The Cancer Imaging Archive (TCIA). There have been numerous requests f...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Medical physics (Lancaster) 2020-11, Vol.47 (11), p.5931-5940
Hauptverfasser: Kalendralis, Petros, Shi, Zhenwei, Traverso, Alberto, Choudhury, Ananya, Sloep, Matthijs, Zhovannik, Ivan, Starmans, Martijn P.A., Grittner, Detlef, Feltens, Peter, Monshouwer, Rene, Klein, Stefan, Fijten, Rianne, Aerts, Hugo, Dekker, Andre, Soest, Johan, Wee, Leonard
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Purpose One of the most frequently cited radiomics investigations showed that features automatically extracted from routine clinical images could be used in prognostic modeling. These images have been made publicly accessible via The Cancer Imaging Archive (TCIA). There have been numerous requests for additional explanatory metadata on the following datasets — RIDER, Interobserver, Lung1, and Head–Neck1. To support repeatability, reproducibility, generalizability, and transparency in radiomics research, we publish the subjects’ clinical data, extracted radiomics features, and digital imaging and communications in medicine (DICOM) headers of these four datasets with descriptive metadata, in order to be more compliant with findable, accessible, interoperable, and reusable (FAIR) data management principles. Acquisition and validation methods Overall survival time intervals were updated using a national citizens registry after internal ethics board approval. Spatial offsets of the primary gross tumor volume (GTV) regions of interest (ROIs) associated with the Lung1 CT series were improved on the TCIA. GTV radiomics features were extracted using the open‐source Ontology‐Guided Radiomics Analysis Workflow (O‐RAW). We reshaped the output of O‐RAW to map features and extraction settings to the latest version of Radiomics Ontology, so as to be consistent with the Image Biomarker Standardization Initiative (IBSI). Digital imaging and communications in medicine metadata was extracted using a research version of Semantic DICOM (SOHARD, GmbH, Fuerth; Germany). Subjects’ clinical data were described with metadata using the Radiation Oncology Ontology. All of the above were published in Resource Descriptor Format (RDF), that is, triples. Example SPARQL queries are shared with the reader to use on the online triples archive, which are intended to illustrate how to exploit this data submission. Data format The accumulated RDF data are publicly accessible through a SPARQL endpoint where the triples are archived. The endpoint is remotely queried through a graph database web application at http://sparql.cancerdata.org. SPARQL queries are intrinsically federated, such that we can efficiently cross‐reference clinical, DICOM, and radiomics data within a single query, while being agnostic to the original data format and coding system. The federated queries work in the same way even if the RDF data were partitioned across multiple servers and dispersed physical locations. Potentia
ISSN:0094-2405
2473-4209
DOI:10.1002/mp.14322