ERVQA: A Dataset to Benchmark the Readiness of Large Vision Language Models in Hospital Environments
The global shortage of healthcare workers has demanded the development of smart healthcare assistants, which can help monitor and alert healthcare workers when necessary. We examine the healthcare knowledge of existing Large Vision Language Models (LVLMs) via the Visual Question Answering (VQA) task...
Gespeichert in:
Veröffentlicht in: | arXiv.org 2024-10 |
---|---|
Hauptverfasser: | , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | arXiv.org |
container_volume | |
creator | Sourjyadip Ray Gupta, Kushal Kundu, Soumi Kasat, Payal Arvind Somak Aditya Goyal, Pawan |
description | The global shortage of healthcare workers has demanded the development of smart healthcare assistants, which can help monitor and alert healthcare workers when necessary. We examine the healthcare knowledge of existing Large Vision Language Models (LVLMs) via the Visual Question Answering (VQA) task in hospital settings through expert annotated open-ended questions. We introduce the Emergency Room Visual Question Answering (ERVQA) dataset, consisting of triplets covering diverse emergency room scenarios, a seminal benchmark for LVLMs. By developing a detailed error taxonomy and analyzing answer trends, we reveal the nuanced nature of the task. We benchmark state-of-the-art open-source and closed LVLMs using traditional and adapted VQA metrics: Entailment Score and CLIPScore Confidence. Analyzing errors across models, we infer trends based on properties like decoder type, model size, and in-context examples. Our findings suggest the ERVQA dataset presents a highly complex task, highlighting the need for specialized, domain-specific solutions. |
format | Article |
fullrecord | <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_3115227084</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3115227084</sourcerecordid><originalsourceid>FETCH-proquest_journals_31152270843</originalsourceid><addsrcrecordid>eNqNisEKgkAUAJcgKMp_eNBZ0F2t6GZldKhDEl5jyVet2Vvbt_b9eegDOg3DzECMpVJxuEykHImAuY6iSM4XMk3VWFR5UZ6yFWSw1V4zevAW1kjXx0u7J_gHQoG6MoTMYG9w0O6OUBo2lnqhe6d7P9oKGwZDsLfcGq8byOljnKUXkuepGN50wxj8OBGzXX7e7MPW2XeH7C-17Rz16aLiOJVyES0T9d_1BQckRNs</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3115227084</pqid></control><display><type>article</type><title>ERVQA: A Dataset to Benchmark the Readiness of Large Vision Language Models in Hospital Environments</title><source>Free E- Journals</source><creator>Sourjyadip Ray ; Gupta, Kushal ; Kundu, Soumi ; Kasat, Payal Arvind ; Somak Aditya ; Goyal, Pawan</creator><creatorcontrib>Sourjyadip Ray ; Gupta, Kushal ; Kundu, Soumi ; Kasat, Payal Arvind ; Somak Aditya ; Goyal, Pawan</creatorcontrib><description>The global shortage of healthcare workers has demanded the development of smart healthcare assistants, which can help monitor and alert healthcare workers when necessary. We examine the healthcare knowledge of existing Large Vision Language Models (LVLMs) via the Visual Question Answering (VQA) task in hospital settings through expert annotated open-ended questions. We introduce the Emergency Room Visual Question Answering (ERVQA) dataset, consisting of <image, question, answer> triplets covering diverse emergency room scenarios, a seminal benchmark for LVLMs. By developing a detailed error taxonomy and analyzing answer trends, we reveal the nuanced nature of the task. We benchmark state-of-the-art open-source and closed LVLMs using traditional and adapted VQA metrics: Entailment Score and CLIPScore Confidence. Analyzing errors across models, we infer trends based on properties like decoder type, model size, and in-context examples. Our findings suggest the ERVQA dataset presents a highly complex task, highlighting the need for specialized, domain-specific solutions.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Benchmarks ; Datasets ; Emergency medical services ; Error analysis ; Health care ; Hospitals ; Medical personnel ; Questions ; Taxonomy ; Trends ; Vision ; Visual tasks</subject><ispartof>arXiv.org, 2024-10</ispartof><rights>2024. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>776,780</link.rule.ids></links><search><creatorcontrib>Sourjyadip Ray</creatorcontrib><creatorcontrib>Gupta, Kushal</creatorcontrib><creatorcontrib>Kundu, Soumi</creatorcontrib><creatorcontrib>Kasat, Payal Arvind</creatorcontrib><creatorcontrib>Somak Aditya</creatorcontrib><creatorcontrib>Goyal, Pawan</creatorcontrib><title>ERVQA: A Dataset to Benchmark the Readiness of Large Vision Language Models in Hospital Environments</title><title>arXiv.org</title><description>The global shortage of healthcare workers has demanded the development of smart healthcare assistants, which can help monitor and alert healthcare workers when necessary. We examine the healthcare knowledge of existing Large Vision Language Models (LVLMs) via the Visual Question Answering (VQA) task in hospital settings through expert annotated open-ended questions. We introduce the Emergency Room Visual Question Answering (ERVQA) dataset, consisting of <image, question, answer> triplets covering diverse emergency room scenarios, a seminal benchmark for LVLMs. By developing a detailed error taxonomy and analyzing answer trends, we reveal the nuanced nature of the task. We benchmark state-of-the-art open-source and closed LVLMs using traditional and adapted VQA metrics: Entailment Score and CLIPScore Confidence. Analyzing errors across models, we infer trends based on properties like decoder type, model size, and in-context examples. Our findings suggest the ERVQA dataset presents a highly complex task, highlighting the need for specialized, domain-specific solutions.</description><subject>Benchmarks</subject><subject>Datasets</subject><subject>Emergency medical services</subject><subject>Error analysis</subject><subject>Health care</subject><subject>Hospitals</subject><subject>Medical personnel</subject><subject>Questions</subject><subject>Taxonomy</subject><subject>Trends</subject><subject>Vision</subject><subject>Visual tasks</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNqNisEKgkAUAJcgKMp_eNBZ0F2t6GZldKhDEl5jyVet2Vvbt_b9eegDOg3DzECMpVJxuEykHImAuY6iSM4XMk3VWFR5UZ6yFWSw1V4zevAW1kjXx0u7J_gHQoG6MoTMYG9w0O6OUBo2lnqhe6d7P9oKGwZDsLfcGq8byOljnKUXkuepGN50wxj8OBGzXX7e7MPW2XeH7C-17Rz16aLiOJVyES0T9d_1BQckRNs</recordid><startdate>20241008</startdate><enddate>20241008</enddate><creator>Sourjyadip Ray</creator><creator>Gupta, Kushal</creator><creator>Kundu, Soumi</creator><creator>Kasat, Payal Arvind</creator><creator>Somak Aditya</creator><creator>Goyal, Pawan</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20241008</creationdate><title>ERVQA: A Dataset to Benchmark the Readiness of Large Vision Language Models in Hospital Environments</title><author>Sourjyadip Ray ; Gupta, Kushal ; Kundu, Soumi ; Kasat, Payal Arvind ; Somak Aditya ; Goyal, Pawan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_31152270843</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Benchmarks</topic><topic>Datasets</topic><topic>Emergency medical services</topic><topic>Error analysis</topic><topic>Health care</topic><topic>Hospitals</topic><topic>Medical personnel</topic><topic>Questions</topic><topic>Taxonomy</topic><topic>Trends</topic><topic>Vision</topic><topic>Visual tasks</topic><toplevel>online_resources</toplevel><creatorcontrib>Sourjyadip Ray</creatorcontrib><creatorcontrib>Gupta, Kushal</creatorcontrib><creatorcontrib>Kundu, Soumi</creatorcontrib><creatorcontrib>Kasat, Payal Arvind</creatorcontrib><creatorcontrib>Somak Aditya</creatorcontrib><creatorcontrib>Goyal, Pawan</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Sourjyadip Ray</au><au>Gupta, Kushal</au><au>Kundu, Soumi</au><au>Kasat, Payal Arvind</au><au>Somak Aditya</au><au>Goyal, Pawan</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>ERVQA: A Dataset to Benchmark the Readiness of Large Vision Language Models in Hospital Environments</atitle><jtitle>arXiv.org</jtitle><date>2024-10-08</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>The global shortage of healthcare workers has demanded the development of smart healthcare assistants, which can help monitor and alert healthcare workers when necessary. We examine the healthcare knowledge of existing Large Vision Language Models (LVLMs) via the Visual Question Answering (VQA) task in hospital settings through expert annotated open-ended questions. We introduce the Emergency Room Visual Question Answering (ERVQA) dataset, consisting of <image, question, answer> triplets covering diverse emergency room scenarios, a seminal benchmark for LVLMs. By developing a detailed error taxonomy and analyzing answer trends, we reveal the nuanced nature of the task. We benchmark state-of-the-art open-source and closed LVLMs using traditional and adapted VQA metrics: Entailment Score and CLIPScore Confidence. Analyzing errors across models, we infer trends based on properties like decoder type, model size, and in-context examples. Our findings suggest the ERVQA dataset presents a highly complex task, highlighting the need for specialized, domain-specific solutions.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | EISSN: 2331-8422 |
ispartof | arXiv.org, 2024-10 |
issn | 2331-8422 |
language | eng |
recordid | cdi_proquest_journals_3115227084 |
source | Free E- Journals |
subjects | Benchmarks Datasets Emergency medical services Error analysis Health care Hospitals Medical personnel Questions Taxonomy Trends Vision Visual tasks |
title | ERVQA: A Dataset to Benchmark the Readiness of Large Vision Language Models in Hospital Environments |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-04T09%3A00%3A25IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=ERVQA:%20A%20Dataset%20to%20Benchmark%20the%20Readiness%20of%20Large%20Vision%20Language%20Models%20in%20Hospital%20Environments&rft.jtitle=arXiv.org&rft.au=Sourjyadip%20Ray&rft.date=2024-10-08&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E3115227084%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3115227084&rft_id=info:pmid/&rfr_iscdi=true |