ELIXR: Towards a general purpose X-ray artificial intelligence system through alignment of large language models and radiology vision encoders

In this work, we present an approach, which we call Embeddings for Language/Image-aligned X-Rays, or ELIXR, that leverages a language-aligned image encoder combined or grafted onto a fixed LLM, PaLM 2, to perform a broad range of chest X-ray tasks. We train this lightweight adapter architecture usin...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Xu, Shawn, Yang, Lin, Kelly, Christopher, Sieniek, Marcin, Kohlberger, Timo, Ma, Martin, Weng, Wei-Hung, Kiraly, Atilla, Kazemzadeh, Sahar, Melamed, Zakkai, Park, Jungyeon, Strachan, Patricia, Liu, Yun, Lau, Chuck, Singh, Preeti, Chen, Christina, Etemadi, Mozziyar, Kalidindi, Sreenivasa Raju, Matias, Yossi, Chou, Katherine, Corrado, Greg S, Shetty, Shravya, Tse, Daniel, Prabhakara, Shruthi, Golden, Daniel, Pilgrim, Rory, Eswaran, Krish, Sellergren, Andrew
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Computer Vision and Pattern Recognition
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In this work, we present an approach, which we call Embeddings for Language/Image-aligned X-Rays, or ELIXR, that leverages a language-aligned image encoder combined or grafted onto a fixed LLM, PaLM 2, to perform a broad range of chest X-ray tasks. We train this lightweight adapter architecture using images paired with corresponding free-text radiology reports from the MIMIC-CXR dataset. ELIXR achieved state-of-the-art performance on zero-shot chest X-ray (CXR) classification (mean AUC of 0.850 across 13 findings), data-efficient CXR classification (mean AUCs of 0.893 and 0.898 across five findings (atelectasis, cardiomegaly, consolidation, pleural effusion, and pulmonary edema) for 1% (~2,200 images) and 10% (~22,000 images) training data), and semantic search (0.76 normalized discounted cumulative gain (NDCG) across nineteen queries, including perfect retrieval on twelve of them). Compared to existing data-efficient methods including supervised contrastive learning (SupCon), ELIXR required two orders of magnitude less data to reach similar performance. ELIXR also showed promise on CXR vision-language tasks, demonstrating overall accuracies of 58.7% and 62.5% on visual question answering and report quality assurance tasks, respectively. These results suggest that ELIXR is a robust and versatile approach to CXR AI.
DOI:	10.48550/arxiv.2308.01317