Context-aware co-supervision for accurate object detection

•We advocate the importance of equipping two-stage detectors with top-down signals, in order to which provides high-level contextual cues to complement low-level features. In practice, this is implemented by adding a side path in the detection head to predict all object classes in the image, which i...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Pattern recognition 2022-01, Vol.121, p.108199, Article 108199
Hauptverfasser: Peng, Junran, Wang, Haoquan, Yue, Shaolong, Zhang, Zhaoxiang
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:•We advocate the importance of equipping two-stage detectors with top-down signals, in order to which provides high-level contextual cues to complement low-level features. In practice, this is implemented by adding a side path in the detection head to predict all object classes in the image, which is co-supervised by image-level semantics and requires little extra overheads.•Our research reveals the usefulness of combining top-down and bottom-up signals in object detection, and we believe it to be generalized to other tasks. The simplicity and originality of our approach leave much room for future research, in which we will append more powerful modules to enhance contexts and other cues for visual recognition. State-of-the-art object detection approaches are often composed of two stages, namely, proposing a number of regions on an image and classifying each of them into one class. Both stages share a network backbone which builds visual features in a bottom-up manner. In this paper, we advocate the importance of equipping two-stage detectors with top-down signals, in order to which provides high-level contextual cues to complement low-level features. In practice, this is implemented by adding a side path in the detection head to predict all object classes in the image, which is co-supervised by image-level semantics and requires little extra overheads. Our approach is easily applied to two popular object detection algorithms, and achieves consistent performance gain in the MS-COCO dataset.
ISSN:0031-3203
1873-5142
DOI:10.1016/j.patcog.2021.108199