Pegasus-v1 Technical Report

This technical report introduces Pegasus-1, a multimodal language model specialized in video content understanding and interaction through natural language. Pegasus-1 is designed to address the unique challenges posed by video data, such as interpreting spatiotemporal information, to offer nuanced v...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:arXiv.org 2024-04
Hauptverfasser: Jung, Raehyuk, Go, Hyojun, Yi, Jaehyuk, Jang, Jiho, Kim, Daniel, Suh, Jay, Lee, Aiden, Cooper, Han, Lee, Jae, Kim, Jeff, Jin-Young, Kim, Kim, Junwan, Park, Kyle, Lee, Lucas, Mars Ha, Seo, Minjoon, Abraham, Jo, Park, Ed, Kianinejad, Hassan, Kim, S J, Moon, Tony, Jeong, Wade, Popescu, Andrei, Kim, Esther, Yoon, E K, Heo, Genie, Choi, Henry, Kang, Jenna, Han, Kevin, Seo, Noah, Nguyen, Sunny, Ryan, Won, Park, Yeonhoo, Giuliani, Anthony, Chung, Dave, Yoon, Hans, Le, James, Ahn, Jenny, Lee, June, Saini, Maninder, Sanders, Meredith, Lee, Soyoung, Kim, Sue, Couture, Travis
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:This technical report introduces Pegasus-1, a multimodal language model specialized in video content understanding and interaction through natural language. Pegasus-1 is designed to address the unique challenges posed by video data, such as interpreting spatiotemporal information, to offer nuanced video content comprehension across various lengths. This technical report overviews Pegasus-1's architecture, training strategies, and its performance in benchmarks on video conversation, zero-shot video question answering, and video summarization. We also explore qualitative characteristics of Pegasus-1 , demonstrating its capabilities as well as its limitations, in order to provide readers a balanced view of its current state and its future direction.
ISSN:2331-8422