ERVQA: A Dataset to Benchmark the Readiness of Large Vision Language Models in Hospital Environments

The global shortage of healthcare workers has demanded the development of smart healthcare assistants, which can help monitor and alert healthcare workers when necessary. We examine the healthcare knowledge of existing Large Vision Language Models (LVLMs) via the Visual Question Answering (VQA) task...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:arXiv.org 2024-10
Hauptverfasser: Sourjyadip Ray, Gupta, Kushal, Kundu, Soumi, Kasat, Payal Arvind, Somak Aditya, Goyal, Pawan
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Sourjyadip Ray
Gupta, Kushal
Kundu, Soumi
Kasat, Payal Arvind
Somak Aditya
Goyal, Pawan
description The global shortage of healthcare workers has demanded the development of smart healthcare assistants, which can help monitor and alert healthcare workers when necessary. We examine the healthcare knowledge of existing Large Vision Language Models (LVLMs) via the Visual Question Answering (VQA) task in hospital settings through expert annotated open-ended questions. We introduce the Emergency Room Visual Question Answering (ERVQA) dataset, consisting of triplets covering diverse emergency room scenarios, a seminal benchmark for LVLMs. By developing a detailed error taxonomy and analyzing answer trends, we reveal the nuanced nature of the task. We benchmark state-of-the-art open-source and closed LVLMs using traditional and adapted VQA metrics: Entailment Score and CLIPScore Confidence. Analyzing errors across models, we infer trends based on properties like decoder type, model size, and in-context examples. Our findings suggest the ERVQA dataset presents a highly complex task, highlighting the need for specialized, domain-specific solutions.
format Article
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_3115227084</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3115227084</sourcerecordid><originalsourceid>FETCH-proquest_journals_31152270843</originalsourceid><addsrcrecordid>eNqNisEKgkAUAJcgKMp_eNBZ0F2t6GZldKhDEl5jyVet2Vvbt_b9eegDOg3DzECMpVJxuEykHImAuY6iSM4XMk3VWFR5UZ6yFWSw1V4zevAW1kjXx0u7J_gHQoG6MoTMYG9w0O6OUBo2lnqhe6d7P9oKGwZDsLfcGq8byOljnKUXkuepGN50wxj8OBGzXX7e7MPW2XeH7C-17Rz16aLiOJVyES0T9d_1BQckRNs</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3115227084</pqid></control><display><type>article</type><title>ERVQA: A Dataset to Benchmark the Readiness of Large Vision Language Models in Hospital Environments</title><source>Free E- Journals</source><creator>Sourjyadip Ray ; Gupta, Kushal ; Kundu, Soumi ; Kasat, Payal Arvind ; Somak Aditya ; Goyal, Pawan</creator><creatorcontrib>Sourjyadip Ray ; Gupta, Kushal ; Kundu, Soumi ; Kasat, Payal Arvind ; Somak Aditya ; Goyal, Pawan</creatorcontrib><description>The global shortage of healthcare workers has demanded the development of smart healthcare assistants, which can help monitor and alert healthcare workers when necessary. We examine the healthcare knowledge of existing Large Vision Language Models (LVLMs) via the Visual Question Answering (VQA) task in hospital settings through expert annotated open-ended questions. We introduce the Emergency Room Visual Question Answering (ERVQA) dataset, consisting of &lt;image, question, answer&gt; triplets covering diverse emergency room scenarios, a seminal benchmark for LVLMs. By developing a detailed error taxonomy and analyzing answer trends, we reveal the nuanced nature of the task. We benchmark state-of-the-art open-source and closed LVLMs using traditional and adapted VQA metrics: Entailment Score and CLIPScore Confidence. Analyzing errors across models, we infer trends based on properties like decoder type, model size, and in-context examples. Our findings suggest the ERVQA dataset presents a highly complex task, highlighting the need for specialized, domain-specific solutions.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Benchmarks ; Datasets ; Emergency medical services ; Error analysis ; Health care ; Hospitals ; Medical personnel ; Questions ; Taxonomy ; Trends ; Vision ; Visual tasks</subject><ispartof>arXiv.org, 2024-10</ispartof><rights>2024. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>776,780</link.rule.ids></links><search><creatorcontrib>Sourjyadip Ray</creatorcontrib><creatorcontrib>Gupta, Kushal</creatorcontrib><creatorcontrib>Kundu, Soumi</creatorcontrib><creatorcontrib>Kasat, Payal Arvind</creatorcontrib><creatorcontrib>Somak Aditya</creatorcontrib><creatorcontrib>Goyal, Pawan</creatorcontrib><title>ERVQA: A Dataset to Benchmark the Readiness of Large Vision Language Models in Hospital Environments</title><title>arXiv.org</title><description>The global shortage of healthcare workers has demanded the development of smart healthcare assistants, which can help monitor and alert healthcare workers when necessary. We examine the healthcare knowledge of existing Large Vision Language Models (LVLMs) via the Visual Question Answering (VQA) task in hospital settings through expert annotated open-ended questions. We introduce the Emergency Room Visual Question Answering (ERVQA) dataset, consisting of &lt;image, question, answer&gt; triplets covering diverse emergency room scenarios, a seminal benchmark for LVLMs. By developing a detailed error taxonomy and analyzing answer trends, we reveal the nuanced nature of the task. We benchmark state-of-the-art open-source and closed LVLMs using traditional and adapted VQA metrics: Entailment Score and CLIPScore Confidence. Analyzing errors across models, we infer trends based on properties like decoder type, model size, and in-context examples. Our findings suggest the ERVQA dataset presents a highly complex task, highlighting the need for specialized, domain-specific solutions.</description><subject>Benchmarks</subject><subject>Datasets</subject><subject>Emergency medical services</subject><subject>Error analysis</subject><subject>Health care</subject><subject>Hospitals</subject><subject>Medical personnel</subject><subject>Questions</subject><subject>Taxonomy</subject><subject>Trends</subject><subject>Vision</subject><subject>Visual tasks</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNqNisEKgkAUAJcgKMp_eNBZ0F2t6GZldKhDEl5jyVet2Vvbt_b9eegDOg3DzECMpVJxuEykHImAuY6iSM4XMk3VWFR5UZ6yFWSw1V4zevAW1kjXx0u7J_gHQoG6MoTMYG9w0O6OUBo2lnqhe6d7P9oKGwZDsLfcGq8byOljnKUXkuepGN50wxj8OBGzXX7e7MPW2XeH7C-17Rz16aLiOJVyES0T9d_1BQckRNs</recordid><startdate>20241008</startdate><enddate>20241008</enddate><creator>Sourjyadip Ray</creator><creator>Gupta, Kushal</creator><creator>Kundu, Soumi</creator><creator>Kasat, Payal Arvind</creator><creator>Somak Aditya</creator><creator>Goyal, Pawan</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20241008</creationdate><title>ERVQA: A Dataset to Benchmark the Readiness of Large Vision Language Models in Hospital Environments</title><author>Sourjyadip Ray ; Gupta, Kushal ; Kundu, Soumi ; Kasat, Payal Arvind ; Somak Aditya ; Goyal, Pawan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_31152270843</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Benchmarks</topic><topic>Datasets</topic><topic>Emergency medical services</topic><topic>Error analysis</topic><topic>Health care</topic><topic>Hospitals</topic><topic>Medical personnel</topic><topic>Questions</topic><topic>Taxonomy</topic><topic>Trends</topic><topic>Vision</topic><topic>Visual tasks</topic><toplevel>online_resources</toplevel><creatorcontrib>Sourjyadip Ray</creatorcontrib><creatorcontrib>Gupta, Kushal</creatorcontrib><creatorcontrib>Kundu, Soumi</creatorcontrib><creatorcontrib>Kasat, Payal Arvind</creatorcontrib><creatorcontrib>Somak Aditya</creatorcontrib><creatorcontrib>Goyal, Pawan</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Sourjyadip Ray</au><au>Gupta, Kushal</au><au>Kundu, Soumi</au><au>Kasat, Payal Arvind</au><au>Somak Aditya</au><au>Goyal, Pawan</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>ERVQA: A Dataset to Benchmark the Readiness of Large Vision Language Models in Hospital Environments</atitle><jtitle>arXiv.org</jtitle><date>2024-10-08</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>The global shortage of healthcare workers has demanded the development of smart healthcare assistants, which can help monitor and alert healthcare workers when necessary. We examine the healthcare knowledge of existing Large Vision Language Models (LVLMs) via the Visual Question Answering (VQA) task in hospital settings through expert annotated open-ended questions. We introduce the Emergency Room Visual Question Answering (ERVQA) dataset, consisting of &lt;image, question, answer&gt; triplets covering diverse emergency room scenarios, a seminal benchmark for LVLMs. By developing a detailed error taxonomy and analyzing answer trends, we reveal the nuanced nature of the task. We benchmark state-of-the-art open-source and closed LVLMs using traditional and adapted VQA metrics: Entailment Score and CLIPScore Confidence. Analyzing errors across models, we infer trends based on properties like decoder type, model size, and in-context examples. Our findings suggest the ERVQA dataset presents a highly complex task, highlighting the need for specialized, domain-specific solutions.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2024-10
issn 2331-8422
language eng
recordid cdi_proquest_journals_3115227084
source Free E- Journals
subjects Benchmarks
Datasets
Emergency medical services
Error analysis
Health care
Hospitals
Medical personnel
Questions
Taxonomy
Trends
Vision
Visual tasks
title ERVQA: A Dataset to Benchmark the Readiness of Large Vision Language Models in Hospital Environments
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-04T09%3A00%3A25IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=ERVQA:%20A%20Dataset%20to%20Benchmark%20the%20Readiness%20of%20Large%20Vision%20Language%20Models%20in%20Hospital%20Environments&rft.jtitle=arXiv.org&rft.au=Sourjyadip%20Ray&rft.date=2024-10-08&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E3115227084%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3115227084&rft_id=info:pmid/&rfr_iscdi=true