Transformers for extracting breast cancer information from Spanish clinical narratives

The wide adoption of electronic health records (EHRs) offers immense potential as a source of support for clinical research. However, previous studies focused on extracting only a limited set of medical concepts to support information extraction in the cancer domain for the Spanish language. Buildin...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Artificial intelligence in medicine 2023-09, Vol.143, p.102625-102625, Article 102625
Hauptverfasser:	Solarte-Pabón, Oswaldo, Montenegro, Orlando, García-Barragán, Alvaro, Torrente, Maria, Provencio, Mariano, Menasalvas, Ernestina, Robles, Víctor
Format:	Artikel
Sprache:	eng
Schlagworte:	Breast cancer Breast Neoplasms Clinical narratives Deep Learning Electronic Health Records Information Storage and Retrieval Multilingualism Named Entity Recognition (NER) Natural Language Processing Natural Language Processing (NLP)
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The wide adoption of electronic health records (EHRs) offers immense potential as a source of support for clinical research. However, previous studies focused on extracting only a limited set of medical concepts to support information extraction in the cancer domain for the Spanish language. Building on the success of deep learning for processing natural language texts, this paper proposes a transformer-based approach to extract named entities from breast cancer clinical notes written in Spanish and compares several language models. To facilitate this approach, a schema for annotating clinical notes with breast cancer concepts is presented, and a corpus for breast cancer is developed. Results indicate that both BERT-based and RoBERTa-based language models demonstrate competitive performance in clinical Named Entity Recognition (NER). Specifically, BETO and multilingual BERT achieve F-scores of 93.71% and 94.63%, respectively. Additionally, RoBERTa Biomedical attains an F-score of 95.01%, while RoBERTa BNE achieves an F-score of 94.54%. The findings suggest that transformers can feasibly extract information in the clinical domain in the Spanish language, with the use of models trained on biomedical texts contributing to enhanced results. The proposed approach takes advantage of transfer learning techniques by fine-tuning language models to automatically represent text features and avoiding the time-consuming feature engineering process. •A deep learning-based approach for extracting breast cancer information from clinical narratives written in Spanish.•A comprehensive annotation scheme to represent medical concepts in the breast cancer domain.•A corpus manually annotated by clinicians for supporting clinical named entity recognition in the breast cancer domain.•Experiments on transformer-based language models to extract information using a breast cancer corpus.•A method to exploit transfer learning to perform cancer named entity recognition using language models that avoids feature engineering processes.
ISSN:	0933-3657 1873-2860
DOI:	10.1016/j.artmed.2023.102625