A Fast Discrete Wavelet Transform Using Hybrid Parallelism on GPUs

Wavelet transform has been widely used in many signal and image processing applications. Due to its wide adoption for time-critical applications, such as streaming and real-time signal processing, many acceleration techniques were developed during the past decade. Recently, the graphics processing u...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on parallel and distributed systems 2016-11, Vol.27 (11), p.3088-3100
Hauptverfasser:	Quan, Tran Minh, Jeong, Won-Ki
Format:	Artikel
Sprache:	eng
Schlagworte:	Acceleration bit rotation Discrete wavelet transforms GPU computing Graphics processing units hybrid parallelism lifting scheme Parallel processing Registers Wavelet transform Wavelet transforms
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Wavelet transform has been widely used in many signal and image processing applications. Due to its wide adoption for time-critical applications, such as streaming and real-time signal processing, many acceleration techniques were developed during the past decade. Recently, the graphics processing unit (GPU) has gained much attention for accelerating computationally-intensive problems and many solutions of GPU-based discrete wavelet transform (DWT) have been introduced, but most of them did not fully leverage the potential of the GPU. In this paper, we present various state-of-the-art GPU optimization strategies in DWT implementation, such as leveraging shared memory, registers, warp shuffling instructions, and thread- and instruction-level parallelism (TLP, ILP), and finally elaborate our hybrid approach to further boost up its performance. In addition, we introduce a novel mixed-band memory layout for Haar DWT, where multi-level transform can be carried out in a single fused kernel launch. As a result, unlike recent GPU DWT methods that focus mainly on maximizing ILP, we show that the optimal GPU DWT performance can be achieved by hybrid parallelism combining both TLP and ILP together in a mixed-band approach. We demonstrate the performance of our proposed method by comparison with other CPU and GPU DWT methods.
ISSN:	1045-9219 1558-2183
DOI:	10.1109/TPDS.2016.2536028