From macro to micro: rethinking multi-scale pedestrian detection

Pedestrian detection is the use of computer vision techniques to determine whether there are pedestrians in an image or video sequence and give their precise positioning, but the difference in the scale of pedestrians has always been a difficult problem in pedestrian detection. In contrast to existi...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Multimedia systems 2023-06, Vol.29 (3), p.1417-1429
Hauptverfasser: He, Yuzhe, He, Ning, Yu, Haigang, Zhang, Ren, Yan, Kang
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Pedestrian detection is the use of computer vision techniques to determine whether there are pedestrians in an image or video sequence and give their precise positioning, but the difference in the scale of pedestrians has always been a difficult problem in pedestrian detection. In contrast to existing research, this study jointly considers the problem of multi-scale pedestrian detection at both the macro- and micro-levels. At the macro-level, the shape and location of an anchor are predicted by feature maps to guide its generation, and the obtained anchor can better adapt to pedestrian targets at different scales. At the micro-level, the standard convolution in the backbone network is replaced with switchable atrous convolution, which effectively solves the problem of scale differences between pedestrians. Finally, the classification and regression tasks in pedestrian detection are completed more efficiently through the use of a Double Head. These elements are combined to form a multi-scale pedestrian detection network, and experimental results show that the model proposed in this paper can substantially improve the performance of multi-scale pedestrian detection. The detection accuracy on the COCOPersons dataset reaches an average precision (AP) of 57.3. Compared with the pedestrian detection accuracy of Faster R-CNN based on a feature pyramid network at large, medium, and small scales, the accuracy of our model is significantly improved at 1.7 AP, 2.5 AP, and 6.8 AP, respectively. On the Caltech pedestrian dataset, the MR 2 of Near, Medium and Far subsets reach 0.45%, 13.78% and 48.85%, respectively. And on the CityPersons pedestrian dataset, the MR 2 of Small, Medium and Large subsets reach 12.1%, 2.6% and 5.5%, respectively.
ISSN:0942-4962
1432-1882
DOI:10.1007/s00530-023-01058-1