Language-guided video target anaphora segmentation method
The invention provides a language-guided video target anaphora segmentation method, which comprises the following steps: in a visual feature coding stage, designing a language-embedded visual encoder to densely embed language features into visual features so as to realize early-stage cross-modal int...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Patent |
Sprache: | chi ; eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The invention provides a language-guided video target anaphora segmentation method, which comprises the following steps: in a visual feature coding stage, designing a language-embedded visual encoder to densely embed language features into visual features so as to realize early-stage cross-modal interaction of visual languages; joint features containing rich language and visual information are extracted; constructing an information exchange and feature refining module, and capturing the capability of a fine-grained semantic clue improvement model for accurately segmenting a reference target in a complex scene by performing multi-path information exchange and fine-grained feature optimization in different frames; in the decoding stage, a semantic-guided feature fusion module is provided, multi-level joint feature integration is guided and fused by means of high-level semantic information, and target-level semantic information is aggregated. High-level semantic information is used as guidance, multi-level joint |
---|