LLM-RG4: Flexible and Factual Radiology Report Generation across Diverse Input Contexts
Drafting radiology reports is a complex task requiring flexibility, where radiologists tail content to available information and particular clinical demands. However, most current radiology report generation (RRG) models are constrained to a fixed task paradigm, such as predicting the full ``finding...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Drafting radiology reports is a complex task requiring flexibility, where
radiologists tail content to available information and particular clinical
demands. However, most current radiology report generation (RRG) models are
constrained to a fixed task paradigm, such as predicting the full ``finding''
section from a single image, inherently involving a mismatch between inputs and
outputs. The trained models lack the flexibility for diverse inputs and could
generate harmful, input-agnostic hallucinations. To bridge the gap between
current RRG models and the clinical demands in practice, we first develop a
data generation pipeline to create a new MIMIC-RG4 dataset, which considers
four common radiology report drafting scenarios and has perfectly corresponded
input and output. Secondly, we propose a novel large language model (LLM) based
RRG framework, namely LLM-RG4, which utilizes LLM's flexible
instruction-following capabilities and extensive general knowledge. We further
develop an adaptive token fusion module that offers flexibility to handle
diverse scenarios with different input combinations, while minimizing the
additional computational burden associated with increased input volumes.
Besides, we propose a token-level loss weighting strategy to direct the model's
attention towards positive and uncertain descriptions. Experimental results
demonstrate that LLM-RG4 achieves state-of-the-art performance in both clinical
efficiency and natural language generation on the MIMIC-RG4 and MIMIC-CXR
datasets. We quantitatively demonstrate that our model has minimal
input-agnostic hallucinations, whereas current open-source models commonly
suffer from this problem. |
---|---|
DOI: | 10.48550/arxiv.2412.12001 |