A Survey on Multimodal Benchmarks: In the Era of Large AI Models
The rapid evolution of Multimodal Large Language Models (MLLMs) has brought substantial advancements in artificial intelligence, significantly enhancing the capability to understand and generate multimodal content. While prior studies have largely concentrated on model architectures and training met...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The rapid evolution of Multimodal Large Language Models (MLLMs) has brought
substantial advancements in artificial intelligence, significantly enhancing
the capability to understand and generate multimodal content. While prior
studies have largely concentrated on model architectures and training
methodologies, a thorough analysis of the benchmarks used for evaluating these
models remains underexplored. This survey addresses this gap by systematically
reviewing 211 benchmarks that assess MLLMs across four core domains:
understanding, reasoning, generation, and application. We provide a detailed
analysis of task designs, evaluation metrics, and dataset constructions, across
diverse modalities. We hope that this survey will contribute to the ongoing
advancement of MLLM research by offering a comprehensive overview of
benchmarking practices and identifying promising directions for future work. An
associated GitHub repository collecting the latest papers is available. |
---|---|
DOI: | 10.48550/arxiv.2409.18142 |