VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents

Large Multimodal Models (LMMs) have ushered in a new era in artificial intelligence, merging capabilities in both language and vision to form highly capable Visual Foundation Agents. These agents are postulated to excel across a myriad of tasks, potentially approaching general artificial intelligenc...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Liu, Xiao, Zhang, Tianjie, Gu, Yu, Iong, Iat Long, Xu, Yifan, Song, Xixuan, Zhang, Shudan, Lai, Hanyu, Liu, Xinyi, Zhao, Hanlin, Sun, Jiadai, Yang, Xinyue, Yang, Yu, Qi, Zehan, Yao, Shuntian, Sun, Xueqiao, Cheng, Siyi, Zheng, Qinkai, Yu, Hao, Zhang, Hanchen, Hong, Wenyi, Ding, Ming, Pan, Lihang, Gu, Xiaotao, Zeng, Aohan, Du, Zhengxiao, Song, Chan Hee, Su, Yu, Dong, Yuxiao, Tang, Jie
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!