Multi-scale global context feature pyramid network for object detector

In order to capture more contextual information, various attention mechanisms are applied to object detectors. However, the spatial interaction in the commonly used attention mechanisms is single scale, and it cannot capture the context information of the objects from the feature maps of different s...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Signal, image and video processing image and video processing, 2022-04, Vol.16 (3), p.705-713
Hauptverfasser:	Li, Yunhao, Shao, Mingwen, Fan, Bingbing, Zhang, Wei
Format:	Artikel
Sprache:	eng
Schlagworte:	Bells Boxes Computer Imaging Computer Science Context Feature maps Image Processing and Computer Vision Multimedia Information Systems Original Paper Pattern Recognition and Graphics Signal,Image and Speech Processing Vision
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In order to capture more contextual information, various attention mechanisms are applied to object detectors. However, the spatial interaction in the commonly used attention mechanisms is single scale, and it cannot capture the context information of the objects from the feature maps of different scales, which will lead to the underutilization of the context information. In addition, since the predicted bounding box does not completely fit the shape and pose of the object, it has room for further improvement in the performance. In this paper, we propose a multi-scale global context feature pyramid network to obtain a feature pyramid with richer context information, which is a two-layer lightweight neck structure. Moreover, we extend the regression branch by adding an additional prediction head to predict the corner offsets of the bounding boxes to further refine the bounding boxes, which can effectively improve the accuracy of the predicted bounding boxes. Extensive experiments are conducted on the MS COCO 2017 detection datasets. Without bells and whistles, the proposed methods show an average 2% improvement over the RetinaNet baseline.
ISSN:	1863-1703 1863-1711
DOI:	10.1007/s11760-021-02010-4