OphGLM: An ophthalmology large language-and-vision assistant

Vision computer-aided diagnostic methods have been used in early ophthalmic disease screening and diagnosis. However, the limited output formats of these methods lead to poor human–computer interaction and low clinical applicability value. Thus, ophthalmic visual question answering is worth studying...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Artificial intelligence in medicine 2024-11, Vol.157, p.103001, Article 103001
Hauptverfasser: Deng, Zhuo, Gao, Weihao, Chen, Chucheng, Niu, Zhiyuan, Gong, Zheng, Zhang, Ruiheng, Cao, Zhenjie, Li, Fang, Ma, Zhaoyi, Wei, Wenbin, Ma, Lan
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page 103001
container_title Artificial intelligence in medicine
container_volume 157
creator Deng, Zhuo
Gao, Weihao
Chen, Chucheng
Niu, Zhiyuan
Gong, Zheng
Zhang, Ruiheng
Cao, Zhenjie
Li, Fang
Ma, Zhaoyi
Wei, Wenbin
Ma, Lan
description Vision computer-aided diagnostic methods have been used in early ophthalmic disease screening and diagnosis. However, the limited output formats of these methods lead to poor human–computer interaction and low clinical applicability value. Thus, ophthalmic visual question answering is worth studying. Unfortunately, no practical solutions exist before Large Language Models(LLMs). In this paper, we investigate the ophthalmic visual diagnostic interaction problem. We construct an ophthalmology large language-and-vision assistant, OphGLM, consisting of an image encoder, a text encoder, a fusion module, and an LLM module. We establish a new Chinese ophthalmic fine-tuning dataset, FundusTuning-CN, including the fundus instruction and conversation sets. Based on FundusTuning-CN, we establish a novel LLM-tuning strategy to introduce visual model understanding and ophthalmic knowledge into LLMs at a low cost and high efficiency. Leveraging the pre-training of the image encoder, OphGLM demonstrates strong visual understanding and surpasses open-source visual language models in common fundus disease classification tasks. The FundusTuning-CN enables OphGLM to surpass open-source medical LLMs in both ophthalmic knowledge and interactive capabilities. Our proposed OphGLM has the potential to revolutionize clinical applications in ophthalmology. The dataset, code, and models will be publicly available at https://github.com/ML-AILab/OphGLM. •Ophthalmology large language-and-vision assistant based on LLMs and pre-training visual diagnostic models.•A new Chinese ophthalmic fine-tuning dataset including the fundus instruction and conversation sets.•Possessing abundant knowledge in ophthalmology and expertise in image visual understanding.
doi_str_mv 10.1016/j.artmed.2024.103001
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_3123803125</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0933365724002434</els_id><sourcerecordid>3123803125</sourcerecordid><originalsourceid>FETCH-LOGICAL-c241t-c3e53399afc5ad0af966070763ad2c2d1600e291b47cda1b762ac8aa1962b0123</originalsourceid><addsrcrecordid>eNp9kF1LwzAUhoMobk7_gUgvvek8Sdq0FRHG0ClMdqPX4TTNtox-mbSD_Xszut16cwI5z8l78hByT2FKgYqn3RRtV-liyoBF_ooD0AsypmnCQ5YKuCRjyDgPuYiTEblxbgcASUTFNRnxLMoABB-Tl1W7XSy_noNZHTTttttiWTVlszkEJdqN9rXe9LjRIdZFuDfONHWAzhnXYd3dkqs1lk7fnc4J-Xl_-55_hMvV4nM-W4aKRbQLFdcx51mGaxVjAbjOhIAEEsGxYIoVVABoltE8SlSBNE8EQ5Ui0kywHCjjE_I4vNva5rfXrpOVcUqXfjnd9E5yz6Tga-zRaECVbZyzei1bayq0B0lBHr3JnRy8yaM3OXjzYw-nhD4_9s5DZ1EeeB0A7f-5N9pKp4yulS6M1aqTRWP-T_gDnoJ-5g</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3123803125</pqid></control><display><type>article</type><title>OphGLM: An ophthalmology large language-and-vision assistant</title><source>MEDLINE</source><source>Elsevier ScienceDirect Journals Complete</source><creator>Deng, Zhuo ; Gao, Weihao ; Chen, Chucheng ; Niu, Zhiyuan ; Gong, Zheng ; Zhang, Ruiheng ; Cao, Zhenjie ; Li, Fang ; Ma, Zhaoyi ; Wei, Wenbin ; Ma, Lan</creator><creatorcontrib>Deng, Zhuo ; Gao, Weihao ; Chen, Chucheng ; Niu, Zhiyuan ; Gong, Zheng ; Zhang, Ruiheng ; Cao, Zhenjie ; Li, Fang ; Ma, Zhaoyi ; Wei, Wenbin ; Ma, Lan</creatorcontrib><description>Vision computer-aided diagnostic methods have been used in early ophthalmic disease screening and diagnosis. However, the limited output formats of these methods lead to poor human–computer interaction and low clinical applicability value. Thus, ophthalmic visual question answering is worth studying. Unfortunately, no practical solutions exist before Large Language Models(LLMs). In this paper, we investigate the ophthalmic visual diagnostic interaction problem. We construct an ophthalmology large language-and-vision assistant, OphGLM, consisting of an image encoder, a text encoder, a fusion module, and an LLM module. We establish a new Chinese ophthalmic fine-tuning dataset, FundusTuning-CN, including the fundus instruction and conversation sets. Based on FundusTuning-CN, we establish a novel LLM-tuning strategy to introduce visual model understanding and ophthalmic knowledge into LLMs at a low cost and high efficiency. Leveraging the pre-training of the image encoder, OphGLM demonstrates strong visual understanding and surpasses open-source visual language models in common fundus disease classification tasks. The FundusTuning-CN enables OphGLM to surpass open-source medical LLMs in both ophthalmic knowledge and interactive capabilities. Our proposed OphGLM has the potential to revolutionize clinical applications in ophthalmology. The dataset, code, and models will be publicly available at https://github.com/ML-AILab/OphGLM. •Ophthalmology large language-and-vision assistant based on LLMs and pre-training visual diagnostic models.•A new Chinese ophthalmic fine-tuning dataset including the fundus instruction and conversation sets.•Possessing abundant knowledge in ophthalmology and expertise in image visual understanding.</description><identifier>ISSN: 0933-3657</identifier><identifier>ISSN: 1873-2860</identifier><identifier>EISSN: 1873-2860</identifier><identifier>DOI: 10.1016/j.artmed.2024.103001</identifier><identifier>PMID: 39490063</identifier><language>eng</language><publisher>Netherlands: Elsevier B.V</publisher><subject>Diagnosis, Computer-Assisted - methods ; Eye Diseases - diagnosis ; Fundus Oculi ; Humans ; Large language models ; Ophthalmology ; Ophthalmology - methods ; Visual dialogue interaction</subject><ispartof>Artificial intelligence in medicine, 2024-11, Vol.157, p.103001, Article 103001</ispartof><rights>2024 The Authors</rights><rights>Copyright © 2024 The Authors. Published by Elsevier B.V. All rights reserved.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c241t-c3e53399afc5ad0af966070763ad2c2d1600e291b47cda1b762ac8aa1962b0123</cites><orcidid>0000-0002-8907-588X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://dx.doi.org/10.1016/j.artmed.2024.103001$$EHTML$$P50$$Gelsevier$$Hfree_for_read</linktohtml><link.rule.ids>314,780,784,3550,27924,27925,45995</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/39490063$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Deng, Zhuo</creatorcontrib><creatorcontrib>Gao, Weihao</creatorcontrib><creatorcontrib>Chen, Chucheng</creatorcontrib><creatorcontrib>Niu, Zhiyuan</creatorcontrib><creatorcontrib>Gong, Zheng</creatorcontrib><creatorcontrib>Zhang, Ruiheng</creatorcontrib><creatorcontrib>Cao, Zhenjie</creatorcontrib><creatorcontrib>Li, Fang</creatorcontrib><creatorcontrib>Ma, Zhaoyi</creatorcontrib><creatorcontrib>Wei, Wenbin</creatorcontrib><creatorcontrib>Ma, Lan</creatorcontrib><title>OphGLM: An ophthalmology large language-and-vision assistant</title><title>Artificial intelligence in medicine</title><addtitle>Artif Intell Med</addtitle><description>Vision computer-aided diagnostic methods have been used in early ophthalmic disease screening and diagnosis. However, the limited output formats of these methods lead to poor human–computer interaction and low clinical applicability value. Thus, ophthalmic visual question answering is worth studying. Unfortunately, no practical solutions exist before Large Language Models(LLMs). In this paper, we investigate the ophthalmic visual diagnostic interaction problem. We construct an ophthalmology large language-and-vision assistant, OphGLM, consisting of an image encoder, a text encoder, a fusion module, and an LLM module. We establish a new Chinese ophthalmic fine-tuning dataset, FundusTuning-CN, including the fundus instruction and conversation sets. Based on FundusTuning-CN, we establish a novel LLM-tuning strategy to introduce visual model understanding and ophthalmic knowledge into LLMs at a low cost and high efficiency. Leveraging the pre-training of the image encoder, OphGLM demonstrates strong visual understanding and surpasses open-source visual language models in common fundus disease classification tasks. The FundusTuning-CN enables OphGLM to surpass open-source medical LLMs in both ophthalmic knowledge and interactive capabilities. Our proposed OphGLM has the potential to revolutionize clinical applications in ophthalmology. The dataset, code, and models will be publicly available at https://github.com/ML-AILab/OphGLM. •Ophthalmology large language-and-vision assistant based on LLMs and pre-training visual diagnostic models.•A new Chinese ophthalmic fine-tuning dataset including the fundus instruction and conversation sets.•Possessing abundant knowledge in ophthalmology and expertise in image visual understanding.</description><subject>Diagnosis, Computer-Assisted - methods</subject><subject>Eye Diseases - diagnosis</subject><subject>Fundus Oculi</subject><subject>Humans</subject><subject>Large language models</subject><subject>Ophthalmology</subject><subject>Ophthalmology - methods</subject><subject>Visual dialogue interaction</subject><issn>0933-3657</issn><issn>1873-2860</issn><issn>1873-2860</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNp9kF1LwzAUhoMobk7_gUgvvek8Sdq0FRHG0ClMdqPX4TTNtox-mbSD_Xszut16cwI5z8l78hByT2FKgYqn3RRtV-liyoBF_ooD0AsypmnCQ5YKuCRjyDgPuYiTEblxbgcASUTFNRnxLMoABB-Tl1W7XSy_noNZHTTttttiWTVlszkEJdqN9rXe9LjRIdZFuDfONHWAzhnXYd3dkqs1lk7fnc4J-Xl_-55_hMvV4nM-W4aKRbQLFdcx51mGaxVjAbjOhIAEEsGxYIoVVABoltE8SlSBNE8EQ5Ui0kywHCjjE_I4vNva5rfXrpOVcUqXfjnd9E5yz6Tga-zRaECVbZyzei1bayq0B0lBHr3JnRy8yaM3OXjzYw-nhD4_9s5DZ1EeeB0A7f-5N9pKp4yulS6M1aqTRWP-T_gDnoJ-5g</recordid><startdate>202411</startdate><enddate>202411</enddate><creator>Deng, Zhuo</creator><creator>Gao, Weihao</creator><creator>Chen, Chucheng</creator><creator>Niu, Zhiyuan</creator><creator>Gong, Zheng</creator><creator>Zhang, Ruiheng</creator><creator>Cao, Zhenjie</creator><creator>Li, Fang</creator><creator>Ma, Zhaoyi</creator><creator>Wei, Wenbin</creator><creator>Ma, Lan</creator><general>Elsevier B.V</general><scope>6I.</scope><scope>AAFTH</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0002-8907-588X</orcidid></search><sort><creationdate>202411</creationdate><title>OphGLM: An ophthalmology large language-and-vision assistant</title><author>Deng, Zhuo ; Gao, Weihao ; Chen, Chucheng ; Niu, Zhiyuan ; Gong, Zheng ; Zhang, Ruiheng ; Cao, Zhenjie ; Li, Fang ; Ma, Zhaoyi ; Wei, Wenbin ; Ma, Lan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c241t-c3e53399afc5ad0af966070763ad2c2d1600e291b47cda1b762ac8aa1962b0123</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Diagnosis, Computer-Assisted - methods</topic><topic>Eye Diseases - diagnosis</topic><topic>Fundus Oculi</topic><topic>Humans</topic><topic>Large language models</topic><topic>Ophthalmology</topic><topic>Ophthalmology - methods</topic><topic>Visual dialogue interaction</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Deng, Zhuo</creatorcontrib><creatorcontrib>Gao, Weihao</creatorcontrib><creatorcontrib>Chen, Chucheng</creatorcontrib><creatorcontrib>Niu, Zhiyuan</creatorcontrib><creatorcontrib>Gong, Zheng</creatorcontrib><creatorcontrib>Zhang, Ruiheng</creatorcontrib><creatorcontrib>Cao, Zhenjie</creatorcontrib><creatorcontrib>Li, Fang</creatorcontrib><creatorcontrib>Ma, Zhaoyi</creatorcontrib><creatorcontrib>Wei, Wenbin</creatorcontrib><creatorcontrib>Ma, Lan</creatorcontrib><collection>ScienceDirect Open Access Titles</collection><collection>Elsevier:ScienceDirect:Open Access</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><jtitle>Artificial intelligence in medicine</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Deng, Zhuo</au><au>Gao, Weihao</au><au>Chen, Chucheng</au><au>Niu, Zhiyuan</au><au>Gong, Zheng</au><au>Zhang, Ruiheng</au><au>Cao, Zhenjie</au><au>Li, Fang</au><au>Ma, Zhaoyi</au><au>Wei, Wenbin</au><au>Ma, Lan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>OphGLM: An ophthalmology large language-and-vision assistant</atitle><jtitle>Artificial intelligence in medicine</jtitle><addtitle>Artif Intell Med</addtitle><date>2024-11</date><risdate>2024</risdate><volume>157</volume><spage>103001</spage><pages>103001-</pages><artnum>103001</artnum><issn>0933-3657</issn><issn>1873-2860</issn><eissn>1873-2860</eissn><abstract>Vision computer-aided diagnostic methods have been used in early ophthalmic disease screening and diagnosis. However, the limited output formats of these methods lead to poor human–computer interaction and low clinical applicability value. Thus, ophthalmic visual question answering is worth studying. Unfortunately, no practical solutions exist before Large Language Models(LLMs). In this paper, we investigate the ophthalmic visual diagnostic interaction problem. We construct an ophthalmology large language-and-vision assistant, OphGLM, consisting of an image encoder, a text encoder, a fusion module, and an LLM module. We establish a new Chinese ophthalmic fine-tuning dataset, FundusTuning-CN, including the fundus instruction and conversation sets. Based on FundusTuning-CN, we establish a novel LLM-tuning strategy to introduce visual model understanding and ophthalmic knowledge into LLMs at a low cost and high efficiency. Leveraging the pre-training of the image encoder, OphGLM demonstrates strong visual understanding and surpasses open-source visual language models in common fundus disease classification tasks. The FundusTuning-CN enables OphGLM to surpass open-source medical LLMs in both ophthalmic knowledge and interactive capabilities. Our proposed OphGLM has the potential to revolutionize clinical applications in ophthalmology. The dataset, code, and models will be publicly available at https://github.com/ML-AILab/OphGLM. •Ophthalmology large language-and-vision assistant based on LLMs and pre-training visual diagnostic models.•A new Chinese ophthalmic fine-tuning dataset including the fundus instruction and conversation sets.•Possessing abundant knowledge in ophthalmology and expertise in image visual understanding.</abstract><cop>Netherlands</cop><pub>Elsevier B.V</pub><pmid>39490063</pmid><doi>10.1016/j.artmed.2024.103001</doi><orcidid>https://orcid.org/0000-0002-8907-588X</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0933-3657
ispartof Artificial intelligence in medicine, 2024-11, Vol.157, p.103001, Article 103001
issn 0933-3657
1873-2860
1873-2860
language eng
recordid cdi_proquest_miscellaneous_3123803125
source MEDLINE; Elsevier ScienceDirect Journals Complete
subjects Diagnosis, Computer-Assisted - methods
Eye Diseases - diagnosis
Fundus Oculi
Humans
Large language models
Ophthalmology
Ophthalmology - methods
Visual dialogue interaction
title OphGLM: An ophthalmology large language-and-vision assistant
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-24T14%3A56%3A17IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=OphGLM:%20An%20ophthalmology%20large%20language-and-vision%20assistant&rft.jtitle=Artificial%20intelligence%20in%20medicine&rft.au=Deng,%20Zhuo&rft.date=2024-11&rft.volume=157&rft.spage=103001&rft.pages=103001-&rft.artnum=103001&rft.issn=0933-3657&rft.eissn=1873-2860&rft_id=info:doi/10.1016/j.artmed.2024.103001&rft_dat=%3Cproquest_cross%3E3123803125%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3123803125&rft_id=info:pmid/39490063&rft_els_id=S0933365724002434&rfr_iscdi=true