Image intensive description method based on window self-attention and multi-scale mechanism

The invention discloses an image dense description method based on window self-attention and a multi-scale mechanism, which is formed by combining a target detector and a region description generator, and comprises the following steps of: in the target detector, carrying out image representation lea...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	DENG HONGYU, WANG QI, ZHANG BANGMEI, WANG JIANJUN, WU XUE
Format:	Patent
Sprache:	chi ; eng
Schlagworte:	CALCULATING COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS COMPUTING COUNTING PHYSICS
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The invention discloses an image dense description method based on window self-attention and a multi-scale mechanism, which is formed by combining a target detector and a region description generator, and comprises the following steps of: in the target detector, carrying out image representation learning and extraction on an input image through a window attention-based feature encoder; the feature encoder is formed by stacking 12 layers of ViT modules, in each layer of module, an image feature map is divided into a plurality of windows equal in size, attention operation in the windows is carried out, the feature encoder calculates five kinds of image features of different scales, position information of a key area is predicted through a target detection head, and the target detection head detects the position information of the key area. According to the model, regional features are cut from multi-scale features, a pre-trained BERT model is adopted as a core by a regional description generator, and regional d