OphGLM: An ophthalmology large language-and-vision assistant
Vision computer-aided diagnostic methods have been used in early ophthalmic disease screening and diagnosis. However, the limited output formats of these methods lead to poor human–computer interaction and low clinical applicability value. Thus, ophthalmic visual question answering is worth studying...
Gespeichert in:
Veröffentlicht in: | Artificial intelligence in medicine 2024-11, Vol.157, p.103001, Article 103001 |
---|---|
Hauptverfasser: | , , , , , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Vision computer-aided diagnostic methods have been used in early ophthalmic disease screening and diagnosis. However, the limited output formats of these methods lead to poor human–computer interaction and low clinical applicability value. Thus, ophthalmic visual question answering is worth studying. Unfortunately, no practical solutions exist before Large Language Models(LLMs). In this paper, we investigate the ophthalmic visual diagnostic interaction problem. We construct an ophthalmology large language-and-vision assistant, OphGLM, consisting of an image encoder, a text encoder, a fusion module, and an LLM module. We establish a new Chinese ophthalmic fine-tuning dataset, FundusTuning-CN, including the fundus instruction and conversation sets. Based on FundusTuning-CN, we establish a novel LLM-tuning strategy to introduce visual model understanding and ophthalmic knowledge into LLMs at a low cost and high efficiency. Leveraging the pre-training of the image encoder, OphGLM demonstrates strong visual understanding and surpasses open-source visual language models in common fundus disease classification tasks. The FundusTuning-CN enables OphGLM to surpass open-source medical LLMs in both ophthalmic knowledge and interactive capabilities. Our proposed OphGLM has the potential to revolutionize clinical applications in ophthalmology. The dataset, code, and models will be publicly available at https://github.com/ML-AILab/OphGLM.
•Ophthalmology large language-and-vision assistant based on LLMs and pre-training visual diagnostic models.•A new Chinese ophthalmic fine-tuning dataset including the fundus instruction and conversation sets.•Possessing abundant knowledge in ophthalmology and expertise in image visual understanding. |
---|---|
ISSN: | 0933-3657 1873-2860 1873-2860 |
DOI: | 10.1016/j.artmed.2024.103001 |