ERVQA: A Dataset to Benchmark the Readiness of Large Vision Language Models in Hospital Environments

The global shortage of healthcare workers has demanded the development of smart healthcare assistants, which can help monitor and alert healthcare workers when necessary. We examine the healthcare knowledge of existing Large Vision Language Models (LVLMs) via the Visual Question Answering (VQA) task...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2024-10
Hauptverfasser:	Sourjyadip Ray, Gupta, Kushal, Kundu, Soumi, Kasat, Payal Arvind, Somak Aditya, Goyal, Pawan
Format:	Artikel
Sprache:	eng
Schlagworte:	Benchmarks Datasets Emergency medical services Error analysis Health care Hospitals Medical personnel Questions Taxonomy Trends Vision Visual tasks
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Sourjyadip Ray Gupta, Kushal Kundu, Soumi Kasat, Payal Arvind Somak Aditya Goyal, Pawan
description	The global shortage of healthcare workers has demanded the development of smart healthcare assistants, which can help monitor and alert healthcare workers when necessary. We examine the healthcare knowledge of existing Large Vision Language Models (LVLMs) via the Visual Question Answering (VQA) task in hospital settings through expert annotated open-ended questions. We introduce the Emergency Room Visual Question Answering (ERVQA) dataset, consisting of triplets covering diverse emergency room scenarios, a seminal benchmark for LVLMs. By developing a detailed error taxonomy and analyzing answer trends, we reveal the nuanced nature of the task. We benchmark state-of-the-art open-source and closed LVLMs using traditional and adapted VQA metrics: Entailment Score and CLIPScore Confidence. Analyzing errors across models, we infer trends based on properties like decoder type, model size, and in-context examples. Our findings suggest the ERVQA dataset presents a highly complex task, highlighting the need for specialized, domain-specific solutions.
format	Article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_3115227084</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3115227084</sourcerecordid><originalsourceid>FETCH-proquest_journals_31152270843</originalsourceid><addsrcrecordid>eNqNisEKgkAUAJcgKMp_eNBZ0F2t6GZldKhDEl5jyVet2Vvbt_b9eegDOg3DzECMpVJxuEykHImAuY6iSM4XMk3VWFR5UZ6yFWSw1V4zevAW1kjXx0u7J_gHQoG6MoTMYG9w0O6OUBo2lnqhe6d7P9oKGwZDsLfcGq8byOljnKUXkuepGN50wxj8OBGzXX7e7MPW2XeH7C-17Rz16aLiOJVyES0T9d_1BQckRNs</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3115227084</pqid></control><display><type>article</type><title>ERVQA: A Dataset to Benchmark the Readiness of Large Vision Language Models in Hospital Environments</title><source>Free E- Journals</source><creator>Sourjyadip Ray ; Gupta, Kushal ; Kundu, Soumi ; Kasat, Payal Arvind ; Somak Aditya ; Goyal, Pawan</creator><creatorcontrib>Sourjyadip Ray ; Gupta, Kushal ; Kundu, Soumi ; Kasat, Payal Arvind ; Somak Aditya ; Goyal, Pawan</creatorcontrib><description>The global shortage of healthcare workers has demanded the development of smart healthcare assistants, which can help monitor and alert healthcare workers when necessary. We examine the healthcare knowledge of existing Large Vision Language Models (LVLMs) via the Visual Question Answering (VQA) task in hospital settings through expert annotated open-ended questions. We introduce the Emergency Room Visual Question Answering (ERVQA) dataset, consisting of <image, question, answer> triplets covering diverse emergency room scenarios, a seminal benchmark for LVLMs. By developing a detailed error taxonomy and analyzing answer trends, we reveal the nuanced nature of the task. We benchmark state-of-the-art open-source and closed LVLMs using traditional and adapted VQA metrics: Entailment Score and CLIPScore Confidence. Analyzing errors across models, we infer trends based on properties like decoder type, model size, and in-context examples. Our findings suggest the ERVQA dataset presents a highly complex task, highlighting the need for specialized, domain-specific solutions.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Benchmarks ; Datasets ; Emergency medical services ; Error analysis ; Health care ; Hospitals ; Medical personnel ; Questions ; Taxonomy ; Trends ; Vision ; Visual tasks</subject><ispartof>arXiv.org, 2024-10</ispartof><rights>2024. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>776,780</link.rule.ids></links><search><creatorcontrib>Sourjyadip Ray</creatorcontrib><creatorcontrib>Gupta, Kushal</creatorcontrib><creatorcontrib>Kundu, Soumi</creatorcontrib><creatorcontrib>Kasat, Payal Arvind</creatorcontrib><creatorcontrib>Somak Aditya</creatorcontrib><creatorcontrib>Goyal, Pawan</creatorcontrib><title>ERVQA: A Dataset to Benchmark the Readiness of Large Vision Language Models in Hospital Environments</title><title>arXiv.org</title><description>The global shortage of healthcare workers has demanded the development of smart healthcare assistants, which can help monitor and alert healthcare workers when necessary. We examine the healthcare knowledge of existing Large Vision Language Models (LVLMs) via the Visual Question Answering (VQA) task in hospital settings through expert annotated open-ended questions. We introduce the Emergency Room Visual Question Answering (ERVQA) dataset, consisting of <image, question, answer> triplets covering diverse emergency room scenarios, a seminal benchmark for LVLMs. By developing a detailed error taxonomy and analyzing answer trends, we reveal the nuanced nature of the task. We benchmark state-of-the-art open-source and closed LVLMs using traditional and adapted VQA metrics: Entailment Score and CLIPScore Confidence. Analyzing errors across models, we infer trends based on properties like decoder type, model size, and in-context examples. Our findings suggest the ERVQA dataset presents a highly complex task, highlighting the need for specialized, domain-specific solutions.</description><subject>Benchmarks</subject><subject>Datasets</subject><subject>Emergency medical services</subject><subject>Error analysis</subject><subject>Health care</subject><subject>Hospitals</subject><subject>Medical personnel</subject><subject>Questions</subject><subject>Taxonomy</subject><subject>Trends</subject><subject>Vision</subject><subject>Visual tasks</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNqNisEKgkAUAJcgKMp_eNBZ0F2t6GZldKhDEl5jyVet2Vvbt_b9eegDOg3DzECMpVJxuEykHImAuY6iSM4XMk3VWFR5UZ6yFWSw1V4zevAW1kjXx0u7J_gHQoG6MoTMYG9w0O6OUBo2lnqhe6d7P9oKGwZDsLfcGq8byOljnKUXkuepGN50wxj8OBGzXX7e7MPW2XeH7C-17Rz16aLiOJVyES0T9d_1BQckRNs</recordid><startdate>20241008</startdate><enddate>20241008</enddate><creator>Sourjyadip Ray</creator><creator>Gupta, Kushal</creator><creator>Kundu, Soumi</creator><creator>Kasat, Payal Arvind</creator><creator>Somak Aditya</creator><creator>Goyal, Pawan</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20241008</creationdate><title>ERVQA: A Dataset to Benchmark the Readiness of Large Vision Language Models in Hospital Environments</title><author>Sourjyadip Ray ; Gupta, Kushal ; Kundu, Soumi ; Kasat, Payal Arvind ; Somak Aditya ; Goyal, Pawan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_31152270843</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Benchmarks</topic><topic>Datasets</topic><topic>Emergency medical services</topic><topic>Error analysis</topic><topic>Health care</topic><topic>Hospitals</topic><topic>Medical personnel</topic><topic>Questions</topic><topic>Taxonomy</topic><topic>Trends</topic><topic>Vision</topic><topic>Visual tasks</topic><toplevel>online_resources</toplevel><creatorcontrib>Sourjyadip Ray</creatorcontrib><creatorcontrib>Gupta, Kushal</creatorcontrib><creatorcontrib>Kundu, Soumi</creatorcontrib><creatorcontrib>Kasat, Payal Arvind</creatorcontrib><creatorcontrib>Somak Aditya</creatorcontrib><creatorcontrib>Goyal, Pawan</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Sourjyadip Ray</au><au>Gupta, Kushal</au><au>Kundu, Soumi</au><au>Kasat, Payal Arvind</au><au>Somak Aditya</au><au>Goyal, Pawan</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>ERVQA: A Dataset to Benchmark the Readiness of Large Vision Language Models in Hospital Environments</atitle><jtitle>arXiv.org</jtitle><date>2024-10-08</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>The global shortage of healthcare workers has demanded the development of smart healthcare assistants, which can help monitor and alert healthcare workers when necessary. We examine the healthcare knowledge of existing Large Vision Language Models (LVLMs) via the Visual Question Answering (VQA) task in hospital settings through expert annotated open-ended questions. We introduce the Emergency Room Visual Question Answering (ERVQA) dataset, consisting of <image, question, answer> triplets covering diverse emergency room scenarios, a seminal benchmark for LVLMs. By developing a detailed error taxonomy and analyzing answer trends, we reveal the nuanced nature of the task. We benchmark state-of-the-art open-source and closed LVLMs using traditional and adapted VQA metrics: Entailment Score and CLIPScore Confidence. Analyzing errors across models, we infer trends based on properties like decoder type, model size, and in-context examples. Our findings suggest the ERVQA dataset presents a highly complex task, highlighting the need for specialized, domain-specific solutions.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2024-10
issn	2331-8422
language	eng
recordid	cdi_proquest_journals_3115227084
source	Free E- Journals
subjects	Benchmarks Datasets Emergency medical services Error analysis Health care Hospitals Medical personnel Questions Taxonomy Trends Vision Visual tasks
title	ERVQA: A Dataset to Benchmark the Readiness of Large Vision Language Models in Hospital Environments
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-04T09%3A00%3A25IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=ERVQA:%20A%20Dataset%20to%20Benchmark%20the%20Readiness%20of%20Large%20Vision%20Language%20Models%20in%20Hospital%20Environments&rft.jtitle=arXiv.org&rft.au=Sourjyadip%20Ray&rft.date=2024-10-08&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E3115227084%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3115227084&rft_id=info:pmid/&rfr_iscdi=true