An approach to improve SSD through mask prediction of multi-scale feature maps

We propose a novel single shot object detection network with a mask prediction branch. Our motivation is to enhance object detection features with semantic information extracted from deeper layers. The proposed mask prediction branch enriches important features in shallower layers with pixel-wise pr...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Pattern analysis and applications : PAA 2021-08, Vol.24 (3), p.1357-1366
Hauptverfasser: Sun, Peng, Zhao, Yaqin, Zhu, Songhao
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:We propose a novel single shot object detection network with a mask prediction branch. Our motivation is to enhance object detection features with semantic information extracted from deeper layers. The proposed mask prediction branch enriches important features in shallower layers with pixel-wise probability distribution of semantic information. Meanwhile, an improved receptive field block is adopted to increase the scale of receptive field of backbone network without too much extra computing burden. Our network improves the performance significantly over SSD and FSSD (Feature Fusion Single Shot Multi-box Detector) with just a little speed drop. In addition, we discuss the relationship between effective receptive fields and theoretical receptive fields on VGG16 backbone network. Comprehensive experimental results on PASCAL VOC 2007 demonstrate the effectiveness of the proposed method. We achieve a mAP of 79.8 with 300 × 300 input images (81.2 mAP by 512 × 512 inputs) at the speed of 58.4 FPS on a single Nvidia 1080Ti GPU. Experimental results demonstrate that the proposed network achieves a comparable performance with the state-of-the-arts.
ISSN:1433-7541
1433-755X
DOI:10.1007/s10044-021-00993-x