Coordinate-based anchor-free module for object detection

Despite the impressive performance of some recent state-of-the-art detectors, small target detection, scale variation, and label ambiguities remain challenges. To tackle these issues, we present a coordinate-based anchor-free (CBAF) module for object detection. It can be used as a branch of a single...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Applied intelligence (Dordrecht, Netherlands) Netherlands), 2021-12, Vol.51 (12), p.9066-9080
Hauptverfasser:	Tang, Zhiyong, Yang, Jianbing, Pei, Zhongcai, Song, Xiao
Format:	Artikel
Sprache:	eng
Schlagworte:	Ambiguity Artificial Intelligence Boxes Classification Computer Science Computer Science, Artificial Intelligence Datasets Feature maps Machines Manufacturing Mechanical Engineering Modules Object recognition Processes Science & Technology Sensors Target detection Technology Training
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Despite the impressive performance of some recent state-of-the-art detectors, small target detection, scale variation, and label ambiguities remain challenges. To tackle these issues, we present a coordinate-based anchor-free (CBAF) module for object detection. It can be used as a branch of a single-shot detector (e.g., RetinaNet or SSD) or predict the output probabilities and coordinates directly. The main idea of the CBAF module is to predict the category and the adjustments to the box of the object by part feature and its contextual part features, which are based on feature maps divided by spatial coordinates. This is inspired by the fact that human beings can infer an entire object by observing the part of the surrounding environment. The CBAF module will encode and decode boxes in the anchor-free manner per feature map with different resolutions during training and testing. During training, we first use the proposed spatial coordinate partition layer to divide feature maps into several parts of size n × n and then propose a contextual building layer to fuse the part and its contextual parts together. We will demonstrate the CBAF module through a concrete implementation. The CBAF module improves AP scores of object detection with nearly no additional computation when working in conjunction with the anchor-based RetinaNet. Furthermore, experimental results on the MS-COCO dataset show that the mAP of the CBAF module has increased by 1.1 % , compared with RetinaNet. When the CBAF module works in conjunction with the anchor-based RetinaNet, the mAP increased by 2.2 % .
ISSN:	0924-669X 1573-7497
DOI:	10.1007/s10489-021-02373-8