SGDM: Static-Guided Dynamic Module Make Stronger Visual Models
The spatial attention mechanism has been widely used to improve object detection performance. However, its operation is currently limited to static convolutions lacking content-adaptive features. This paper innovatively approaches from the perspective of dynamic convolution. We propose Razor Dynamic...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The spatial attention mechanism has been widely used to improve object
detection performance. However, its operation is currently limited to static
convolutions lacking content-adaptive features. This paper innovatively
approaches from the perspective of dynamic convolution. We propose Razor
Dynamic Convolution (RDConv) to address thetwo flaws in dynamic weight
convolution, making it hard to implement in spatial mechanism: 1) it is
computation-heavy; 2) when generating weights, spatial information is
disregarded. Firstly, by using Razor Operation to generate certain features, we
vastly reduce the parameters of the entire dynamic convolution operation.
Secondly, we added a spatial branch inside RDConv to generate convolutional
kernel parameters with richer spatial information. Embedding dynamic
convolution will also bring the problem of sensitivity to high-frequency noise.
We propose the Static-Guided Dynamic Module (SGDM) to address this limitation.
By using SGDM, we utilize a set of asymmetric static convolution kernel
parameters to guide the construction of dynamic convolution. We introduce the
mechanism of shared weights in static convolution to solve the problem of
dynamic convolution being sensitive to high-frequency noise. Extensive
experiments illustrate that multiple different object detection backbones
equipped with SGDM achieve a highly competitive boost in performance(e.g., +4%
mAP with YOLOv5n on VOC and +1.7% mAP with YOLOv8n on COCO) with negligible
parameter increase(i.e., +0.33M on YOLOv5n and +0.19M on YOLOv8n). |
---|---|
DOI: | 10.48550/arxiv.2403.18282 |