Use of natural language processing to extract and classify papillary thyroid cancer features from surgical pathology reports
Background We aim to use Natural Language Processing (NLP) to automate the extraction and classification of thyroid cancer risk factors from pathology reports. Methods We analyzed 1,410 surgical pathology reports from adult papillary thyroid cancer patients at Mayo Clinic, Rochester, MN, from 2010 t...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Background We aim to use Natural Language Processing (NLP) to automate the
extraction and classification of thyroid cancer risk factors from pathology
reports. Methods We analyzed 1,410 surgical pathology reports from adult
papillary thyroid cancer patients at Mayo Clinic, Rochester, MN, from 2010 to
2019. Structured and non-structured reports were used to create a
consensus-based ground truth dictionary and categorized them into modified
recurrence risk levels. Non-structured reports were narrative, while structured
reports followed standardized formats. We then developed ThyroPath, a
rule-based NLP pipeline, to extract and classify thyroid cancer features into
risk categories. Training involved 225 reports (150 structured, 75
unstructured), with testing on 170 reports (120 structured, 50 unstructured)
for evaluation. The pipeline's performance was assessed using both strict and
lenient criteria for accuracy, precision, recall, and F1-score. Results In
extraction tasks, ThyroPath achieved overall strict F-1 scores of 93% for
structured reports and 90 for unstructured reports, covering 18 thyroid cancer
pathology features. In classification tasks, ThyroPath-extracted information
demonstrated an overall accuracy of 93% in categorizing reports based on their
corresponding guideline-based risk of recurrence: 76.9% for high-risk, 86.8%
for intermediate risk, and 100% for both low and very low-risk cases. However,
ThyroPath achieved 100% accuracy across all thyroid cancer risk categories with
human-extracted pathology information. Conclusions ThyroPath shows promise in
automating the extraction and risk recurrence classification of thyroid
pathology reports at large scale. It offers a solution to laborious manual
reviews and advancing virtual registries. However, it requires further
validation before implementation. |
---|---|
DOI: | 10.48550/arxiv.2406.00015 |