FastRM: An efficient and automatic explainability framework for multimodal generative models
While Large Vision Language Models (LVLMs) have become masterly capable in reasoning over human prompts and visual inputs, they are still prone to producing responses that contain misinformation. Identifying incorrect responses that are not grounded in evidence has become a crucial task in building...
Gespeichert in:
Hauptverfasser: | , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | While Large Vision Language Models (LVLMs) have become masterly capable in
reasoning over human prompts and visual inputs, they are still prone to
producing responses that contain misinformation. Identifying incorrect
responses that are not grounded in evidence has become a crucial task in
building trustworthy AI. Explainability methods such as gradient-based
relevancy maps on LVLM outputs can provide an insight on the decision process
of models, however these methods are often computationally expensive and not
suited for on-the-fly validation of outputs. In this work, we propose FastRM,
an effective way for predicting the explainable Relevancy Maps of LVLM models.
Experimental results show that employing FastRM leads to a 99.8% reduction in
compute time for relevancy map generation and an 44.4% reduction in memory
footprint for the evaluated LVLM, making explainable AI more efficient and
practical, thereby facilitating its deployment in real-world applications. |
---|---|
DOI: | 10.48550/arxiv.2412.01487 |