Language-guided video target anaphora segmentation method

The invention provides a language-guided video target anaphora segmentation method, which comprises the following steps: in a visual feature coding stage, designing a language-embedded visual encoder to densely embed language features into visual features so as to realize early-stage cross-modal int...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	WANG RONG, BI YIHAN, SONG ZHENFENG, LI CHONG, TAN QUANGE, SUN HAICHUN
Format:	Patent
Sprache:	chi ; eng
Schlagworte:	CALCULATING COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS COMPUTING COUNTING PHYSICS
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The invention provides a language-guided video target anaphora segmentation method, which comprises the following steps: in a visual feature coding stage, designing a language-embedded visual encoder to densely embed language features into visual features so as to realize early-stage cross-modal interaction of visual languages; joint features containing rich language and visual information are extracted; constructing an information exchange and feature refining module, and capturing the capability of a fine-grained semantic clue improvement model for accurately segmenting a reference target in a complex scene by performing multi-path information exchange and fine-grained feature optimization in different frames; in the decoding stage, a semantic-guided feature fusion module is provided, multi-level joint feature integration is guided and fused by means of high-level semantic information, and target-level semantic information is aggregated. High-level semantic information is used as guidance, multi-level joint