Synchronized Video Storytelling: Generating Video Narrations with Structured Storyline
https://aclanthology.org/2024.acl-long.513/ Video storytelling is engaging multimedia content that utilizes video and its accompanying narration to attract the audience, where a key challenge is creating narrations for recorded visual scenes. Previous studies on dense video captioning and video stor...
Gespeichert in:
Hauptverfasser: | , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | https://aclanthology.org/2024.acl-long.513/ Video storytelling is engaging multimedia content that utilizes video and its
accompanying narration to attract the audience, where a key challenge is
creating narrations for recorded visual scenes. Previous studies on dense video
captioning and video story generation have made some progress. However, in
practical applications, we typically require synchronized narrations for
ongoing visual scenes. In this work, we introduce a new task of Synchronized
Video Storytelling, which aims to generate synchronous and informative
narrations for videos. These narrations, associated with each video clip,
should relate to the visual content, integrate relevant knowledge, and have an
appropriate word count corresponding to the clip's duration. Specifically, a
structured storyline is beneficial to guide the generation process, ensuring
coherence and integrity. To support the exploration of this task, we introduce
a new benchmark dataset E-SyncVidStory with rich annotations. Since existing
Multimodal LLMs are not effective in addressing this task in one-shot or
few-shot settings, we propose a framework named VideoNarrator that can generate
a storyline for input videos and simultaneously generate narrations with the
guidance of the generated or predefined storyline. We further introduce a set
of evaluation metrics to thoroughly assess the generation. Both automatic and
human evaluations validate the effectiveness of our approach. Our dataset,
codes, and evaluations will be released. |
---|---|
DOI: | 10.48550/arxiv.2405.14040 |