Parallel scalability of face detection in heterogeneous multithreaded architectures

Recently, facial recognition systems have become extremely popular and deployments of this technology are now ubiquitous. Applications ranging from access control to automated surveillance of video feeds rely on facial recognition for precisely identifying persons at multiple locations. Modern facia...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
1. Verfasser: Oro García, David
Format: Dissertation
Sprache:eng
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Recently, facial recognition systems have become extremely popular and deployments of this technology are now ubiquitous. Applications ranging from access control to automated surveillance of video feeds rely on facial recognition for precisely identifying persons at multiple locations. Modern facial recognition software targeting surveillance applications typically needs to analyze video streams in order to identify faces in crowds in real time. The first analytical step to be conducted in facial recognition systems is face detection, which mainly involves determining the precise coordinates and dimensions of all faces appearing on a given image or video frame, and constitutes the first major bottleneck in the pipeline. As opposed to other use cases such as image classification that usually work flawlessly with VGA images, surveillance applications require working with high or ultra high definition resolutions in order to be able to locate and correctly identify people in crowds. Consequently, in order to maximize the chances of obtaining facial mugshots with enough quality and pixel densities to enable accurate facial identification, it is a must to be able to develop algorithms and heuristics that are capable of working with big images. The main challenge is to perform all required computations involved in just a few milliseconds to avoid the slowdown of all subsequent stages of the facial recognition pipeline. In this thesis, we study several low-level parallelization techniques and kernels that efficiently solve the problem of face detection in a scalable manner over multithreaded data-parallel GPU architectures. The first part of the thesis covers a multilevel mechanism that exploits both coarse-grained and fine-grained parallelism in combination with a smart usage of local on-die memories to reduce GPU underutilization when evaluating boosted cascades of ensembles over high-definition videos. We demonstrate that our proposed parallelization strategy solves the problem of GPU underutilization and achieves a 5X speed up when compared to methods relying on serialized kernel execution. The second part of the thesis presents a heuristic and a hybrid framework combining hand-crafted features with state-of-the-art convolutional neural networks to address the problem of real-time face detection in videos at ultra-high definition resolutions (4K and 8K). The obtained results prove that our proposed heuristic is capable of achieving real-time throughput over