Variable block size motion estimation implementation on compute unified device architecture (CUDA)

This paper proposes a highly parallel variable block size full search motion estimation algorithm with concurrent parallel reduction (CPR) on graphics processing unit (GPU) using compute unified device architecture (CUDA). This approach minimizes memory access latency by using high-speed on-chip mem...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Lee, Dong-Kyu, Oh, Seoung-Jun
Format:	Tagungsbericht
Sprache:	eng
Schlagworte:	Architecture Blocking Central processing units Computer architecture Consumption Devices Graphics processing units High definition video Instruction sets Mathematical models Motion estimation Motion simulation Reduction Synchronization Video coding
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	This paper proposes a highly parallel variable block size full search motion estimation algorithm with concurrent parallel reduction (CPR) on graphics processing unit (GPU) using compute unified device architecture (CUDA). This approach minimizes memory access latency by using high-speed on-chip memory of GPU. By applying parallel reductions concurrently depending on the amount of data and the data dependency, the proposed approach increases thread utilization and decreases the number of synchronization points which cause latency. Experimental results show that the proposed approach achieves substantial improvement up to 92 times than the central processing unit (CPU) only counterpart.
ISSN:	2158-3994 2158-4001
DOI:	10.1109/ICCE.2013.6487048