Joint deep separable convolution network and border regression reinforcement for object detection

The improvement of object detection performance mainly depends on the extraction of local information near the target area of interest, which is also the main reason for the lack of feature semantic information. Considering the importance of scene and semantic information for visual recognition, in...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Neural computing & applications 2021-05, Vol.33 (9), p.4299-4314
Hauptverfasser: Quan, Yu, Li, Zhixin, Chen, Shengjia, Zhang, Canlong, Ma, Huifang
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The improvement of object detection performance mainly depends on the extraction of local information near the target area of interest, which is also the main reason for the lack of feature semantic information. Considering the importance of scene and semantic information for visual recognition, in this paper, the improvement of the object detection algorithm is realized from three parts. Firstly, the basic residual convolution module is fused with the separable convolution module to construct a depth-wise separable convolution network (D_SCNet-127 R-CNN). Then, the feature map is sent to the scene-level region proposal self-attention network to re-identify the candidate area. This part is composed of three parallel branches: semantic segmentation module, region proposal network, and region proposal self-attention module. Finally, this paper uses deep reinforcement learning combined with a border regression network to achieve precise location of the object, and improve the calculation speed of the entire model through a light-weight head network. This model can effectively solve the limitation of feature extraction in traditional object detection and obtain more comprehensive detailed features. The experimental on MSCOCO17, Pascal VOC07, and Cityscapes datasets shows that the proposed method has good validity and scalability.
ISSN:0941-0643
1433-3058
DOI:10.1007/s00521-020-05255-1