OphGLM: An ophthalmology large language-and-vision assistant

Vision computer-aided diagnostic methods have been used in early ophthalmic disease screening and diagnosis. However, the limited output formats of these methods lead to poor human–computer interaction and low clinical applicability value. Thus, ophthalmic visual question answering is worth studying...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Artificial intelligence in medicine 2024-11, Vol.157, p.103001, Article 103001
Hauptverfasser: Deng, Zhuo, Gao, Weihao, Chen, Chucheng, Niu, Zhiyuan, Gong, Zheng, Zhang, Ruiheng, Cao, Zhenjie, Li, Fang, Ma, Zhaoyi, Wei, Wenbin, Ma, Lan
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Vision computer-aided diagnostic methods have been used in early ophthalmic disease screening and diagnosis. However, the limited output formats of these methods lead to poor human–computer interaction and low clinical applicability value. Thus, ophthalmic visual question answering is worth studying. Unfortunately, no practical solutions exist before Large Language Models(LLMs). In this paper, we investigate the ophthalmic visual diagnostic interaction problem. We construct an ophthalmology large language-and-vision assistant, OphGLM, consisting of an image encoder, a text encoder, a fusion module, and an LLM module. We establish a new Chinese ophthalmic fine-tuning dataset, FundusTuning-CN, including the fundus instruction and conversation sets. Based on FundusTuning-CN, we establish a novel LLM-tuning strategy to introduce visual model understanding and ophthalmic knowledge into LLMs at a low cost and high efficiency. Leveraging the pre-training of the image encoder, OphGLM demonstrates strong visual understanding and surpasses open-source visual language models in common fundus disease classification tasks. The FundusTuning-CN enables OphGLM to surpass open-source medical LLMs in both ophthalmic knowledge and interactive capabilities. Our proposed OphGLM has the potential to revolutionize clinical applications in ophthalmology. The dataset, code, and models will be publicly available at https://github.com/ML-AILab/OphGLM. •Ophthalmology large language-and-vision assistant based on LLMs and pre-training visual diagnostic models.•A new Chinese ophthalmic fine-tuning dataset including the fundus instruction and conversation sets.•Possessing abundant knowledge in ophthalmology and expertise in image visual understanding.
ISSN:0933-3657
1873-2860
1873-2860
DOI:10.1016/j.artmed.2024.103001