EV-TIFNet: lightweight binocular fusion network assisted by event camera time information for 3D human pose estimation
Human pose estimation using RGB cameras often encounters performance degradation in challenging scenarios such as motion blur or suboptimal lighting. In comparison, event cameras, endowed with a wide dynamic range, microsecond-scale temporal resolution, minimal latency, and low power consumption, de...
Gespeichert in:
Veröffentlicht in: | Journal of real-time image processing 2024-08, Vol.21 (4), p.150, Article 150 |
---|---|
Hauptverfasser: | , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Human pose estimation using RGB cameras often encounters performance degradation in challenging scenarios such as motion blur or suboptimal lighting. In comparison, event cameras, endowed with a wide dynamic range, microsecond-scale temporal resolution, minimal latency, and low power consumption, demonstrate remarkable adaptability in extreme visual environments. Nevertheless, the exploitation of event cameras for pose estimation in current research has not yet fully harnessed the potential of event-driven data, and enhancing model efficiency remains an ongoing pursuit. This work focuses on devising an efficient, compact pose estimation algorithm, with special attention on optimizing the fusion of multi-view event streams for improved pose prediction accuracy. We propose EV-TIFNet, a compact dual-view interactive network, which incorporates event frames along with our custom-designed Global Spatio-Temporal Feature Maps (GTF Maps). To enhance the network’s ability to understand motion characteristics and localize keypoints, we have tailored a dedicated Auxiliary Information Extraction Module (AIE Module) for the GTF Maps. Experimental results demonstrate that our model, with a compact parameter count of 0.55 million, achieves notable advancements on the DHP19 dataset, reducing the
MPJPE
3
D
to 61.45 mm. Building upon the sparsity of event data, the integration of sparse convolution operators replaces a significant portion of traditional convolutional layers, leading to a reduction in computational demand by 28.3%, totalling 8.71 GFLOPs. These design choices highlight the model’s suitability and efficiency in scenarios where computational resources are limited. |
---|---|
ISSN: | 1861-8200 1861-8219 |
DOI: | 10.1007/s11554-024-01528-3 |