FALP Radiology Reports: Annotated corpus for distant metastasis detection

A critical task in oncology is extracting information related to cancer metastasis from electronic health records. Metastasis-related information is crucial for planning treatment, evaluating patient prognoses, and conducting cancer research. Unfortunately, findings of distant metastasis are written...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Ahumada, Ricardo, Dunstan, Jocelyn, Rojas, Matías, Peñafiel, Sergio, Inti Paredes, Báez, Pablo
Format: Dataset
Sprache:spa
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:A critical task in oncology is extracting information related to cancer metastasis from electronic health records. Metastasis-related information is crucial for planning treatment, evaluating patient prognoses, and conducting cancer research. Unfortunately, findings of distant metastasis are written in radiology reports, often unstructured, making it difficult to extract relevant information automatically. In this study, we created a manually annotated clinical corpus using radiology reports of prostate, colorectal, and breast cancer patients. We developed a named entity recognition model to capture entities of distant metastasis. The entities were subsequently employed in automatically classifying the reports according to the presence or absence of metastasis. The NER model detected distant metastasis mentions with a weighted average F1 score performance of 0.84. Whole reports were finally classified with an F1 score of 0.92 for documents without distant metastasis (M0) and 0.90 for documents with distant metastasis (M1). These results show the model's usefulness in detecting distant metastasis entities in three different types of cancer and the consequent classification of reports. The manually annotated corpus (FALP Radiology Reports Corpus) and annotation guidelines are freely released to the research community. We are releasing the dataset in 2 formats: conll_files.zip: Contains the annotated corpus in IOB2 format. This corpus is separated into train, text, and development subsets. text_ann_files.zip: Contains the raw text files for each document along with its annotation file in Standoff format Annotation guidelines can be found in: Ricardo Ahumada, Pablo Báez, Gisselle Caamaño, Jocelyn Garay, & Inti Paredes. (2023). Annotation Guidelines for FALP radiology reports annotated corpus for distant metastasis detection (1.1). Zenodo. https://doi.org/10.5281/zenodo.7623509 This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/4.0/.
DOI:10.5281/zenodo.7623395