GET: group equivariant transformer for person detection of overhead fisheye images

Fisheye cameras has a large field of view, so it is widely used in scene monitoring, robot navigation, intelligent system, virtual reality panorama, augmented reality panorama and other fields, but person detection under the overhead fisheye camera is still a challenge due to its unique radial geome...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Applied intelligence (Dordrecht, Netherlands) Netherlands), 2023-10, Vol.53 (20), p.24551-24565
Hauptverfasser: Chen, Yongqing, Zhu, Dandan, Li, Nanyu, Zhou, You, Bai, Yong
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Fisheye cameras has a large field of view, so it is widely used in scene monitoring, robot navigation, intelligent system, virtual reality panorama, augmented reality panorama and other fields, but person detection under the overhead fisheye camera is still a challenge due to its unique radial geometry and barrel distortion. Generic object detection algorithms do not work well for person detection on panoramic images of the fisheye camera. Recent approaches either use radially aligned bounding boxes to detect persons or improve anchor-based methods to obtain rotated bounding boxes. However, these methods require additional hyperparameters (e.g., anchor boxes) and have low generalization ability. To address this issue, we propose a novel model called Group Equivariant Transformer (GET) which uses the Transformer to directly regress the bounding boxes and rotation angles. GET not need any additional hyperparameters and have generalization ability. In our GET, we uses the Group Equivariant Convolutional Network (GECN) and Multi-Scale Encoder Module (MEM) to extract multi-scale rotated embedding features of overhead fisheye image for Transformer, then we propose an embedding optimization loss to improve the diversity of these features. Finally, we use a Decoder Module (DM) to decode the rotated bounding boxes’information from embedding features. Extensive experiments conducted on three benchmark fisheye camera datasets demonstrate that the proposed method achieves the state of the art.
ISSN:0924-669X
1573-7497
DOI:10.1007/s10489-023-04747-6