Collaboration between clinicians and vision-language models in radiology report generation

Automated radiology report generation has the potential to improve patient care and reduce the workload of radiologists. However, the path toward real-world adoption has been stymied by the challenge of evaluating the clinical quality of artificial intelligence (AI)-generated reports. We build a sta...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Nature medicine 2024-11
Hauptverfasser: Tanno, Ryutaro, Barrett, David G T, Sellergren, Andrew, Ghaisas, Sumedh, Dathathri, Sumanth, See, Abigail, Welbl, Johannes, Lau, Charles, Tu, Tao, Azizi, Shekoofeh, Singhal, Karan, Schaekermann, Mike, May, Rhys, Lee, Roy, Man, SiWai, Mahdavi, Sara, Ahmed, Zahra, Matias, Yossi, Barral, Joelle, Eslami, S M Ali, Belgrave, Danielle, Liu, Yun, Kalidindi, Sreenivasa Raju, Shetty, Shravya, Natarajan, Vivek, Kohli, Pushmeet, Huang, Po-Sen, Karthikesalingam, Alan, Ktena, Ira
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Automated radiology report generation has the potential to improve patient care and reduce the workload of radiologists. However, the path toward real-world adoption has been stymied by the challenge of evaluating the clinical quality of artificial intelligence (AI)-generated reports. We build a state-of-the-art report generation system for chest radiographs, called Flamingo-CXR, and perform an expert evaluation of AI-generated reports by engaging a panel of board-certified radiologists. We observe a wide distribution of preferences across the panel and across clinical settings, with 56.1% of Flamingo-CXR intensive care reports evaluated to be preferable or equivalent to clinician reports, by half or more of the panel, rising to 77.7% for in/outpatient X-rays overall and to 94% for the subset of cases with no pertinent abnormal findings. Errors were observed in human-written reports and Flamingo-CXR reports, with 24.8% of in/outpatient cases containing clinically significant errors in both report types, 22.8% in Flamingo-CXR reports only and 14.0% in human reports only. For reports that contain errors we develop an assistive setting, a demonstration of clinician-AI collaboration for radiology report composition, indicating new possibilities for potential clinical utility.
ISSN:1078-8956
1546-170X
1546-170X
DOI:10.1038/s41591-024-03302-1