MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs
We introduce MIA-Bench, a new benchmark designed to evaluate multimodal large language models (MLLMs) on their ability to strictly adhere to complex instructions. Our benchmark comprises a diverse set of 400 image-prompt pairs, each crafted to challenge the models' compliance with layered instr...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We introduce MIA-Bench, a new benchmark designed to evaluate multimodal large
language models (MLLMs) on their ability to strictly adhere to complex
instructions. Our benchmark comprises a diverse set of 400 image-prompt pairs,
each crafted to challenge the models' compliance with layered instructions in
generating accurate responses that satisfy specific requested patterns.
Evaluation results from a wide array of state-of-the-art MLLMs reveal
significant variations in performance, highlighting areas for improvement in
instruction fidelity. Additionally, we create extra training data and explore
supervised fine-tuning to enhance the models' ability to strictly follow
instructions without compromising performance on other tasks. We hope this
benchmark not only serves as a tool for measuring MLLM adherence to
instructions, but also guides future developments in MLLM training methods. |
---|---|
DOI: | 10.48550/arxiv.2407.01509 |