FastRM: An efficient and automatic explainability framework for multimodal generative models
While Large Vision Language Models (LVLMs) have become masterly capable in reasoning over human prompts and visual inputs, they are still prone to producing responses that contain misinformation. Identifying incorrect responses that are not grounded in evidence has become a crucial task in building...
Gespeichert in:
Hauptverfasser: | , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Stan, Gabriela Ben-Melech Aflalo, Estelle Luo, Man Rosenman, Shachar Le, Tiep Paul, Sayak Tseng, Shao-Yen Lal, Vasudev |
description | While Large Vision Language Models (LVLMs) have become masterly capable in
reasoning over human prompts and visual inputs, they are still prone to
producing responses that contain misinformation. Identifying incorrect
responses that are not grounded in evidence has become a crucial task in
building trustworthy AI. Explainability methods such as gradient-based
relevancy maps on LVLM outputs can provide an insight on the decision process
of models, however these methods are often computationally expensive and not
suited for on-the-fly validation of outputs. In this work, we propose FastRM,
an effective way for predicting the explainable Relevancy Maps of LVLM models.
Experimental results show that employing FastRM leads to a 99.8% reduction in
compute time for relevancy map generation and an 44.4% reduction in memory
footprint for the evaluated LVLM, making explainable AI more efficient and
practical, thereby facilitating its deployment in real-world applications. |
doi_str_mv | 10.48550/arxiv.2412.01487 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2412_01487</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2412_01487</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2412_014873</originalsourceid><addsrcrecordid>eNqFzsEKgkAUBdDZtIjqA1r1fiBTU5J2EUmbNtEykJe-iUczo4yj6d9n0r7Vhcu9cIRYBr4XJXHsb9B23HphFISeH0TJbiruKdbuetnDwQBJyTmTcYCmAGxcqdFxDtRVCtnggxW7HqRFTe_SvkCWFnSjHOuyQAVPMmSHR0swFKTquZhIVDUtfjkTq_R0O57XoyOrLGu0ffb1ZKNn-3_xAXdbQb4</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>FastRM: An efficient and automatic explainability framework for multimodal generative models</title><source>arXiv.org</source><creator>Stan, Gabriela Ben-Melech ; Aflalo, Estelle ; Luo, Man ; Rosenman, Shachar ; Le, Tiep ; Paul, Sayak ; Tseng, Shao-Yen ; Lal, Vasudev</creator><creatorcontrib>Stan, Gabriela Ben-Melech ; Aflalo, Estelle ; Luo, Man ; Rosenman, Shachar ; Le, Tiep ; Paul, Sayak ; Tseng, Shao-Yen ; Lal, Vasudev</creatorcontrib><description>While Large Vision Language Models (LVLMs) have become masterly capable in
reasoning over human prompts and visual inputs, they are still prone to
producing responses that contain misinformation. Identifying incorrect
responses that are not grounded in evidence has become a crucial task in
building trustworthy AI. Explainability methods such as gradient-based
relevancy maps on LVLM outputs can provide an insight on the decision process
of models, however these methods are often computationally expensive and not
suited for on-the-fly validation of outputs. In this work, we propose FastRM,
an effective way for predicting the explainable Relevancy Maps of LVLM models.
Experimental results show that employing FastRM leads to a 99.8% reduction in
compute time for relevancy map generation and an 44.4% reduction in memory
footprint for the evaluated LVLM, making explainable AI more efficient and
practical, thereby facilitating its deployment in real-world applications.</description><identifier>DOI: 10.48550/arxiv.2412.01487</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence</subject><creationdate>2024-12</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,777,882</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2412.01487$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2412.01487$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Stan, Gabriela Ben-Melech</creatorcontrib><creatorcontrib>Aflalo, Estelle</creatorcontrib><creatorcontrib>Luo, Man</creatorcontrib><creatorcontrib>Rosenman, Shachar</creatorcontrib><creatorcontrib>Le, Tiep</creatorcontrib><creatorcontrib>Paul, Sayak</creatorcontrib><creatorcontrib>Tseng, Shao-Yen</creatorcontrib><creatorcontrib>Lal, Vasudev</creatorcontrib><title>FastRM: An efficient and automatic explainability framework for multimodal generative models</title><description>While Large Vision Language Models (LVLMs) have become masterly capable in
reasoning over human prompts and visual inputs, they are still prone to
producing responses that contain misinformation. Identifying incorrect
responses that are not grounded in evidence has become a crucial task in
building trustworthy AI. Explainability methods such as gradient-based
relevancy maps on LVLM outputs can provide an insight on the decision process
of models, however these methods are often computationally expensive and not
suited for on-the-fly validation of outputs. In this work, we propose FastRM,
an effective way for predicting the explainable Relevancy Maps of LVLM models.
Experimental results show that employing FastRM leads to a 99.8% reduction in
compute time for relevancy map generation and an 44.4% reduction in memory
footprint for the evaluated LVLM, making explainable AI more efficient and
practical, thereby facilitating its deployment in real-world applications.</description><subject>Computer Science - Artificial Intelligence</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNqFzsEKgkAUBdDZtIjqA1r1fiBTU5J2EUmbNtEykJe-iUczo4yj6d9n0r7Vhcu9cIRYBr4XJXHsb9B23HphFISeH0TJbiruKdbuetnDwQBJyTmTcYCmAGxcqdFxDtRVCtnggxW7HqRFTe_SvkCWFnSjHOuyQAVPMmSHR0swFKTquZhIVDUtfjkTq_R0O57XoyOrLGu0ffb1ZKNn-3_xAXdbQb4</recordid><startdate>20241202</startdate><enddate>20241202</enddate><creator>Stan, Gabriela Ben-Melech</creator><creator>Aflalo, Estelle</creator><creator>Luo, Man</creator><creator>Rosenman, Shachar</creator><creator>Le, Tiep</creator><creator>Paul, Sayak</creator><creator>Tseng, Shao-Yen</creator><creator>Lal, Vasudev</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20241202</creationdate><title>FastRM: An efficient and automatic explainability framework for multimodal generative models</title><author>Stan, Gabriela Ben-Melech ; Aflalo, Estelle ; Luo, Man ; Rosenman, Shachar ; Le, Tiep ; Paul, Sayak ; Tseng, Shao-Yen ; Lal, Vasudev</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2412_014873</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Artificial Intelligence</topic><toplevel>online_resources</toplevel><creatorcontrib>Stan, Gabriela Ben-Melech</creatorcontrib><creatorcontrib>Aflalo, Estelle</creatorcontrib><creatorcontrib>Luo, Man</creatorcontrib><creatorcontrib>Rosenman, Shachar</creatorcontrib><creatorcontrib>Le, Tiep</creatorcontrib><creatorcontrib>Paul, Sayak</creatorcontrib><creatorcontrib>Tseng, Shao-Yen</creatorcontrib><creatorcontrib>Lal, Vasudev</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Stan, Gabriela Ben-Melech</au><au>Aflalo, Estelle</au><au>Luo, Man</au><au>Rosenman, Shachar</au><au>Le, Tiep</au><au>Paul, Sayak</au><au>Tseng, Shao-Yen</au><au>Lal, Vasudev</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>FastRM: An efficient and automatic explainability framework for multimodal generative models</atitle><date>2024-12-02</date><risdate>2024</risdate><abstract>While Large Vision Language Models (LVLMs) have become masterly capable in
reasoning over human prompts and visual inputs, they are still prone to
producing responses that contain misinformation. Identifying incorrect
responses that are not grounded in evidence has become a crucial task in
building trustworthy AI. Explainability methods such as gradient-based
relevancy maps on LVLM outputs can provide an insight on the decision process
of models, however these methods are often computationally expensive and not
suited for on-the-fly validation of outputs. In this work, we propose FastRM,
an effective way for predicting the explainable Relevancy Maps of LVLM models.
Experimental results show that employing FastRM leads to a 99.8% reduction in
compute time for relevancy map generation and an 44.4% reduction in memory
footprint for the evaluated LVLM, making explainable AI more efficient and
practical, thereby facilitating its deployment in real-world applications.</abstract><doi>10.48550/arxiv.2412.01487</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.2412.01487 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_2412_01487 |
source | arXiv.org |
subjects | Computer Science - Artificial Intelligence |
title | FastRM: An efficient and automatic explainability framework for multimodal generative models |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-20T05%3A17%3A35IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=FastRM:%20An%20efficient%20and%20automatic%20explainability%20framework%20for%20multimodal%20generative%20models&rft.au=Stan,%20Gabriela%20Ben-Melech&rft.date=2024-12-02&rft_id=info:doi/10.48550/arxiv.2412.01487&rft_dat=%3Carxiv_GOX%3E2412_01487%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |