Deep sequential collaborative cognition of vision and language based model for video description

Video description is to translate video into natural language with appropriate sentence patterns and decent words. The task is challenging due to the great semantic gap between visual content and language. Nowadays, many well-designed models are developed. However, the language information is often...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Multimedia tools and applications 2023-09, Vol.82 (23), p.36207-36230
Hauptverfasser: Tang, Pengjie, Tan, Yunlan, Xia, Jiewu
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!