Deep sequential collaborative cognition of vision and language based model for video description
Video description is to translate video into natural language with appropriate sentence patterns and decent words. The task is challenging due to the great semantic gap between visual content and language. Nowadays, many well-designed models are developed. However, the language information is often...
Gespeichert in:
Veröffentlicht in: | Multimedia tools and applications 2023-09, Vol.82 (23), p.36207-36230 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Schreiben Sie den ersten Kommentar!