MMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and Efficient Evaluation

Large Multimodal Models (LMMs) exhibit impressive cross-modal understanding and reasoning abilities, often assessed through multiple-choice questions (MCQs) that include an image, a question, and several options. However, many benchmarks used for such evaluations suffer from systematic biases. Remar...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Huang, Jinsheng, Chen, Liang, Guo, Taian, Zeng, Fu, Zhao, Yusheng, Wu, Bohan, Yuan, Ye, Zhao, Haozhe, Guo, Zhihui, Zhang, Yichi, Yuan, Jingyang, Ju, Wei, Liu, Luchen, Liu, Tianyu, Chang, Baobao, Zhang, Ming
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!