The virtual reference radiologist: comprehensive AI assistance for clinical image reading and interpretation

Objectives Large language models (LLMs) have shown potential in radiology, but their ability to aid radiologists in interpreting imaging studies remains unexplored. We investigated the effects of a state-of-the-art LLM (GPT-4) on the radiologists’ diagnostic workflow. Materials and methods In this r...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	European radiology 2024-10, Vol.34 (10), p.6652-6666
Hauptverfasser:	Siepmann, Robert, Huppertz, Marc, Rastkhiz, Annika, Reen, Matthias, Corban, Eric, Schmidt, Christian, Wilke, Stephan, Schad, Philipp, Yüksel, Can, Kuhl, Christiane, Truhn, Daniel, Nebelung, Sven
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Artificial Intelligence Clinical Competence Confidence Diagnosis Diagnostic Imaging - methods Diagnostic Radiology Differential diagnosis Female Hallucinations Humans Image Interpretation, Computer-Assisted - methods Imaging Imaging Informatics and Artificial Intelligence Internal Medicine Interventional Radiology Large language models Male Medical imaging Medicine Medicine & Public Health Middle Aged Neuroradiology Radiologists Radiology Radiology - methods Retrospective Studies Ultrasound User experience Workflow
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Objectives Large language models (LLMs) have shown potential in radiology, but their ability to aid radiologists in interpreting imaging studies remains unexplored. We investigated the effects of a state-of-the-art LLM (GPT-4) on the radiologists’ diagnostic workflow. Materials and methods In this retrospective study, six radiologists of different experience levels read 40 selected radiographic [ n = 10], CT [ n = 10], MRI [ n = 10], and angiographic [ n = 10] studies unassisted (session one) and assisted by GPT-4 (session two). Each imaging study was presented with demographic data, the chief complaint, and associated symptoms, and diagnoses were registered using an online survey tool. The impact of Artificial Intelligence (AI) on diagnostic accuracy, confidence, user experience, input prompts, and generated responses was assessed. False information was registered. Linear mixed-effect models were used to quantify the factors (fixed: experience, modality, AI assistance; random: radiologist) influencing diagnostic accuracy and confidence. Results When assessing if the correct diagnosis was among the top-3 differential diagnoses, diagnostic accuracy improved slightly from 181/240 (75.4%, unassisted) to 188/240 (78.3%, AI-assisted). Similar improvements were found when only the top differential diagnosis was considered. AI assistance was used in 77.5% of the readings. Three hundred nine prompts were generated, primarily involving differential diagnoses (59.1%) and imaging features of specific conditions (27.5%). Diagnostic confidence was significantly higher when readings were AI-assisted ( p > 0.001). Twenty-three responses (7.4%) were classified as hallucinations, while two (0.6%) were misinterpretations. Conclusion Integrating GPT-4 in the diagnostic process improved diagnostic accuracy slightly and diagnostic confidence significantly. Potentially harmful hallucinations and misinterpretations call for caution and highlight the need for further safeguarding measures. Clinical relevance statement Using GPT-4 as a virtual assistant when reading images made six radiologists of different experience levels feel more confident and provide more accurate diagnoses; yet, GPT-4 gave factually incorrect and potentially harmful information in 7.4% of its responses. Key Points The benefits and dangers of GPT-4 for textual assistance in radiologic image interpretation are unclear. GPT-4’s textual assistance improved radiologists’ diagnostic accuracy from 75 to 78%.
ISSN:	1432-1084 0938-7994 1432-1084
DOI:	10.1007/s00330-024-10727-2