Deep sequential collaborative cognition of vision and language based model for video description

Video description is to translate video into natural language with appropriate sentence patterns and decent words. The task is challenging due to the great semantic gap between visual content and language. Nowadays, many well-designed models are developed. However, the language information is often...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Multimedia tools and applications 2023-09, Vol.82 (23), p.36207-36230
Hauptverfasser:	Tang, Pengjie, Tan, Yunlan, Xia, Jiewu
Format:	Artikel
Sprache:	eng
Schlagworte:	Coding Cognition Cognition & reasoning Collaboration Computer Communication Networks Computer Science Data Structures and Information Theory Datasets Language Learning Multimedia Information Systems Natural language processing Semantics Sentences Special Purpose and Application-Based Systems Vision
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Schreiben Sie den ersten Kommentar!