Depthwise grouped convolution for object detection

Object detection usually adopts two-stage end-to-end networks, which use backbone network (such as VGG and ResNet) for feature extraction and are combined with the region proposal network (RPN) for object localization and classification. In this paper, we explore a novel depthwise grouped convolutio...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Machine vision and applications 2021-11, Vol.32 (6), Article 115
Hauptverfasser: Liao, Yongwei, Lu, Siwei, Yang, Zhenguo, Liu, Wenyin
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Object detection usually adopts two-stage end-to-end networks, which use backbone network (such as VGG and ResNet) for feature extraction and are combined with the region proposal network (RPN) for object localization and classification. In this paper, we explore a novel depthwise grouped convolution (DGC) in the backbone network by integrating channels grouping and depthwise separable convolution, which is able to share the convolution parameters in different channels to reduce the amounts of parameters for speeding up training. In particular, split and shuffle strategies of channels are introduced to enhance information exchange between different groups of channels in DGC block, which can prevent the decrease of performance caused by insufficient object samples. Furthermore, non-local block is adopted in RPN to focus on small objects that are hard to identify. Consequently, we introduce margin-based loss to guide the model training together with the loss of classification and regression. Experiments conducted on the VOC2007, VOC2012 and COCO2017 datasets demonstrate the efficiency and effectiveness of our method for object detection.
ISSN:0932-8092
1432-1769
DOI:10.1007/s00138-021-01243-0