MM-VID: Advancing Video Understanding with GPT-4V(ision)

We present MM-VID, an integrated system that harnesses the capabilities of GPT-4V, combined with specialized tools in vision, audio, and speech, to facilitate advanced video understanding. MM-VID is designed to address the challenges posed by long-form videos and intricate tasks such as reasoning wi...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:arXiv.org 2023-10
Hauptverfasser: Lin, Kevin, Ahmed, Faisal, Li, Linjie, Chung-Ching, Lin, Azarnasab, Ehsan, Yang, Zhengyuan, Wang, Jianfeng, Lin, Liang, Liu, Zicheng, Lu, Yumao, Liu, Ce, Wang, Lijuan
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!