SnipeDet: Attention-guided pyramidal prediction kernels for generic object detection

•APPK module is proposed to tackle misalignments of different scales of objects.•IoU-adaptive loss function helps networks to deal with the hard negative samples.•SORR module is devised to improve the detection efficiency.•Interleaved subsampling method can enhance feature representations. Using sin...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Pattern recognition letters 2021-12, Vol.152, p.302-310
Hauptverfasser: Chen, Suting, Cheng, Zehua, Zhang, Liangchen, Zheng, Yujie
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:•APPK module is proposed to tackle misalignments of different scales of objects.•IoU-adaptive loss function helps networks to deal with the hard negative samples.•SORR module is devised to improve the detection efficiency.•Interleaved subsampling method can enhance feature representations. Using single-scale prediction kernels or Region of Interest (RoI) pooling in the prediction modules of modern object detectors is not very successful in matching different scales of objects. State-of-the-art detectors with the feature pyramid structure built on different resolutions of feature maps can help alleviate this problem. Although with this structure, single-scale prediction kernels or RoI pooling still struggles to detect small objects, and simultaneously, the former continues to encounter the misalignment problem on very large objects. In this paper, we propose the attention-guided pyramidal prediction kernels module with a customized IoU-adaptive loss function to deal with the misalignment problem between the prediction module and different scales of objects. To mitigate the effect of heavy detection head, we also introduce the salient object regions recognition module to identify these regions that have strong object cues. Additionally, interleaved subsampling, as the proposed feature enhancement approach, is used to generate highly discriminative feature representations.  We refer to the detection framework constituted by these proposed methods as SnipeDet. Results show that SnipeDet achieves 41.1 AP at the speed of 15.4 FPS on the MS COCO test-dev set with 512 × 512 input images, which outperforms state-of-the-art one-stage detectors and has a better trade-off between speed and accuracy.
ISSN:0167-8655
1872-7344
DOI:10.1016/j.patrec.2021.10.026