Biomedical text readability after hypernym substitution with fine-tuned large language models

The advent of patient access to complex medical information online has highlighted the need for simplification of biomedical text to improve patient understanding and engagement in taking ownership of their health. However, comprehension of biomedical text remains a difficult task due to the need fo...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	PLOS digital health 2024-04, Vol.3 (4), p.e0000489-e0000489
Hauptverfasser:	Swanson, Karl, He, Shuhan, Calvano, Josh, Chen, David, Telvizian, Talar, Jiang, Lawrence, Chong, Paul, Schwell, Jacob, Mak, Gin, Lee, Jarone
Format:	Artikel
Sprache:	eng
Schlagworte:	Artificial intelligence Automation Benchmarks Biology and Life Sciences Computer and Information Sciences Datasets Engineering and Technology Health education Health literacy Language Large language models Lymphoma Medicine and Health Sciences Natural language processing Patient satisfaction Readability Reading comprehension Semantics Social Sciences Terminology Web portals
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The advent of patient access to complex medical information online has highlighted the need for simplification of biomedical text to improve patient understanding and engagement in taking ownership of their health. However, comprehension of biomedical text remains a difficult task due to the need for domain-specific expertise. We aimed to study the simplification of biomedical text via large language models (LLMs) commonly used for general natural language processing tasks involve text comprehension, summarization, generation, and prediction of new text from prompts. Specifically, we finetuned three variants of large language models to perform substitutions of complex words and word phrases in biomedical text with a related hypernym. The output of the text substitution process using LLMs was evaluated by comparing the pre- and post-substitution texts using four readability metrics and two measures of sentence complexity. A sample of 1,000 biomedical definitions in the National Library of Medicine's Unified Medical Language System (UMLS) was processed with three LLM approaches, and each showed an improvement in readability and sentence complexity after hypernym substitution. Readability scores were translated from a pre-processed collegiate reading level to a post-processed US high-school level. Comparison between the three LLMs showed that the GPT-J-6b approach had the best improvement in measures of sentence complexity. This study demonstrates the merit of hypernym substitution to improve readability of complex biomedical text for the public and highlights the use case for fine-tuning open-access large language models for biomedical natural language processing.
ISSN:	2767-3170 2767-3170
DOI:	10.1371/journal.pdig.0000489