Natural language processing pipeline to extract prostate cancer-related information from clinical notes
Objectives To develop an automated pipeline for extracting prostate cancer-related information from clinical notes. Materials and methods This retrospective study included 23,225 patients who underwent prostate MRI between 2017 and 2022. Cancer risk factors (family history of cancer and digital rect...
Gespeichert in:
Veröffentlicht in: | European radiology 2024-12, Vol.34 (12), p.7878-7891 |
---|---|
Hauptverfasser: | , , , , , , , , , , , , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Objectives
To develop an automated pipeline for extracting prostate cancer-related information from clinical notes.
Materials and methods
This retrospective study included 23,225 patients who underwent prostate MRI between 2017 and 2022. Cancer risk factors (family history of cancer and digital rectal exam findings), pre-MRI prostate pathology, and treatment history of prostate cancer were extracted from free-text clinical notes in English as binary or multi-class classification tasks. Any sentence containing pre-defined keywords was extracted from clinical notes within one year before the MRI. After manually creating sentence-level datasets with ground truth, Bidirectional Encoder Representations from Transformers (BERT)-based sentence-level models were fine-tuned using the extracted sentence as input and the category as output. The patient-level output was determined by compilation of multiple sentence-level outputs using tree-based models. Sentence-level classification performance was evaluated using the area under the receiver operating characteristic curve (AUC) on 15% of the sentence-level dataset (sentence-level test set). The patient-level classification performance was evaluated on the patient-level test set created by radiologists by reviewing the clinical notes of 603 patients. Accuracy and sensitivity were compared between the pipeline and radiologists.
Results
Sentence-level AUCs were ≥ 0.94. The pipeline showed higher patient-level sensitivity for extracting cancer risk factors (e.g., family history of prostate cancer, 96.5% vs. 77.9%,
p
|
---|---|
ISSN: | 1432-1084 0938-7994 1432-1084 |
DOI: | 10.1007/s00330-024-10812-6 |