Language-guided video target anaphora segmentation method

The invention provides a language-guided video target anaphora segmentation method, which comprises the following steps: in a visual feature coding stage, designing a language-embedded visual encoder to densely embed language features into visual features so as to realize early-stage cross-modal int...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: WANG RONG, BI YIHAN, SONG ZHENFENG, LI CHONG, TAN QUANGE, SUN HAICHUN
Format: Patent
Sprache:chi ; eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The invention provides a language-guided video target anaphora segmentation method, which comprises the following steps: in a visual feature coding stage, designing a language-embedded visual encoder to densely embed language features into visual features so as to realize early-stage cross-modal interaction of visual languages; joint features containing rich language and visual information are extracted; constructing an information exchange and feature refining module, and capturing the capability of a fine-grained semantic clue improvement model for accurately segmenting a reference target in a complex scene by performing multi-path information exchange and fine-grained feature optimization in different frames; in the decoding stage, a semantic-guided feature fusion module is provided, multi-level joint feature integration is guided and fused by means of high-level semantic information, and target-level semantic information is aggregated. High-level semantic information is used as guidance, multi-level joint