Electronic case report forms generation from pathology reports by ARGO, automatic record generator for onco-hematology

The unstructured nature of Real-World (RW) data from onco-hematological patients and the scarce accessibility to integrated systems restrain the use of RW information for research purposes. Natural Language Processing (NLP) might help in transposing unstructured reports into standardized electronic...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Scientific reports 2021-12, Vol.11 (1), p.23823-23823, Article 23823
Hauptverfasser: Zaccaria, Gian Maria, Colella, Vito, Colucci, Simona, Clemente, Felice, Pavone, Fabio, Vegliante, Maria Carmela, Esposito, Flavia, Opinto, Giuseppina, Scattone, Anna, Loseto, Giacomo, Minoia, Carla, Rossini, Bernardo, Quinto, Angela Maria, Angiulli, Vito, Grieco, Luigi Alfredo, Fama, Angelo, Ferrero, Simone, Moia, Riccardo, Di Rocco, Alice, Quaglia, Francesca Maria, Tabanelli, Valentina, Guarini, Attilio, Ciavarella, Sabino
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The unstructured nature of Real-World (RW) data from onco-hematological patients and the scarce accessibility to integrated systems restrain the use of RW information for research purposes. Natural Language Processing (NLP) might help in transposing unstructured reports into standardized electronic health records. We exploited NLP to develop an automated tool, named ARGO (Automatic Record Generator for Onco-hematology) to recognize information from pathology reports and populate electronic case report forms (eCRFs) pre-implemented by REDCap. ARGO was applied to hemo-lymphopathology reports of diffuse large B-cell, follicular, and mantle cell lymphomas, and assessed for accuracy (A), precision (P), recall (R) and F1-score (F) on internal (n = 239) and external (n = 93) report series. 326 (98.2%) reports were converted into corresponding eCRFs. Overall, ARGO showed high performance in capturing (1) identification report number (all metrics > 90%), (2) biopsy date (all metrics > 90% in both series), (3) specimen type (86.6% and 91.4% of A, 98.5% and 100.0% of P, 92.5% and 95.5% of F, and 87.2% and 91.4% of R for internal and external series, respectively), (4) diagnosis (100% of P with A, R and F of 90% in both series). We developed and validated a generalizable tool that generates structured eCRFs from real-life pathology reports.
ISSN:2045-2322
2045-2322
DOI:10.1038/s41598-021-03204-z