DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding

We present DeepSeek-VL2, an advanced series of large Mixture-of-Experts (MoE) Vision-Language Models that significantly improves upon its predecessor, DeepSeek-VL, through two key major upgrades. For the vision component, we incorporate a dynamic tiling vision encoding strategy designed for processi...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Wu, Zhiyu, Chen, Xiaokang, Pan, Zizheng, Liu, Xingchao, Liu, Wen, Dai, Damai, Gao, Huazuo, Ma, Yiyang, Wu, Chengyue, Wang, Bingxuan, Xie, Zhenda, Wu, Yu, Hu, Kai, Wang, Jiawei, Sun, Yaofeng, Li, Yukun, Piao, Yishi, Guan, Kang, Liu, Aixin, Xie, Xin, You, Yuxiang, Dong, Kai, Yu, Xingkai, Zhang, Haowei, Zhao, Liang, Wang, Yisong, Ruan, Chong
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!