Few-shot object detection with semantic enhancement and semantic prototype contrastive learning

Few-shot object detection (FSOD), which aims to teach machines to detect objects belonging to novel classes via extremely few annotated data, has attracted extensive research interest. However, the performance of FSOD is still limited by the lack of data. Visual information of novel objects has sign...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Knowledge-based systems 2022-09, Vol.252, p.109411, Article 109411
Hauptverfasser:	Huang, Lian, Dai, Shaosheng, He, Ziqiang
Format:	Artikel
Sprache:	eng
Schlagworte:	Cross-attention Few-shot learning Object detection Supervised contrastive learning Word embedding
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Few-shot object detection (FSOD), which aims to teach machines to detect objects belonging to novel classes via extremely few annotated data, has attracted extensive research interest. However, the performance of FSOD is still limited by the lack of data. Visual information of novel objects has significant intraclass variance under the few-shot setting, so single visual information cannot accurately represent the objects themselves. In contrast, humans are good at combining visual and semantic systems to recognize new concepts simultaneously. In this paper, we fully explore utilizing additional semantic knowledge to assist the FSOD task. Concretely, we first obtain the semantic representation of classes by the word embedding model learned from a large corpus of text. We then design a semantic enhancement (SE) module to enhance the incomprehensively visual representation of novel classes. To further improve the classification performance, we define a semantic prototype contrastive (SPC) loss to learn a more discriminative embedding space, where features to be detected belonging to the same class are compactly clustered around the corresponding semantic representation. Furthermore, we also introduce the semantic margin between different semantic representations for SPC loss to adaptively separate the margin between features belonging to different classes. Extensive experiments on the PASCAL VOC and MS-COCO benchmarks demonstrate that the proposed method achieves state-of-the-art performance. •Semantic knowledge strengthens the expression of visual information.•Semantic prototype contrastive enables the classifier to learn a more discriminative embedding space.•Semantic margin facilitates the separation of features belonging to similar classes.•The proposed method achieves state-of-the-art few-shot object detection performance.
ISSN:	0950-7051 1872-7409
DOI:	10.1016/j.knosys.2022.109411