Image intensive description method based on window self-attention and multi-scale mechanism

The invention discloses an image dense description method based on window self-attention and a multi-scale mechanism, which is formed by combining a target detector and a region description generator, and comprises the following steps of: in the target detector, carrying out image representation lea...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: DENG HONGYU, WANG QI, ZHANG BANGMEI, WANG JIANJUN, WU XUE
Format: Patent
Sprache:chi ; eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The invention discloses an image dense description method based on window self-attention and a multi-scale mechanism, which is formed by combining a target detector and a region description generator, and comprises the following steps of: in the target detector, carrying out image representation learning and extraction on an input image through a window attention-based feature encoder; the feature encoder is formed by stacking 12 layers of ViT modules, in each layer of module, an image feature map is divided into a plurality of windows equal in size, attention operation in the windows is carried out, the feature encoder calculates five kinds of image features of different scales, position information of a key area is predicted through a target detection head, and the target detection head detects the position information of the key area. According to the model, regional features are cut from multi-scale features, a pre-trained BERT model is adopted as a core by a regional description generator, and regional d