Image intensive description method based on window self-attention and multi-scale mechanism
The invention discloses an image dense description method based on window self-attention and a multi-scale mechanism, which is formed by combining a target detector and a region description generator, and comprises the following steps of: in the target detector, carrying out image representation lea...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Patent |
Sprache: | chi ; eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The invention discloses an image dense description method based on window self-attention and a multi-scale mechanism, which is formed by combining a target detector and a region description generator, and comprises the following steps of: in the target detector, carrying out image representation learning and extraction on an input image through a window attention-based feature encoder; the feature encoder is formed by stacking 12 layers of ViT modules, in each layer of module, an image feature map is divided into a plurality of windows equal in size, attention operation in the windows is carried out, the feature encoder calculates five kinds of image features of different scales, position information of a key area is predicted through a target detection head, and the target detection head detects the position information of the key area. According to the model, regional features are cut from multi-scale features, a pre-trained BERT model is adopted as a core by a regional description generator, and regional d |
---|