A efficient parallel deblocking filter based on GPU: Implementation and optimization

The deblocking filter represents one of the most time consuming tasks of the H.264/AVC standard. Due to its characteristics of data dependencies and frequent memory access, it poses an arduous challenge to mapping the algorithm onto massively parallel architecture efficiently. In this paper, a novel...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Huayou Su, Chunyuan Zhang, Jun Chai, Qianming Yang
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The deblocking filter represents one of the most time consuming tasks of the H.264/AVC standard. Due to its characteristics of data dependencies and frequent memory access, it poses an arduous challenge to mapping the algorithm onto massively parallel architecture efficiently. In this paper, a novel parallel deblocking filter is proposed based on GPU, which weaken the dependencies between MBs by rearrange the filter orders of boundaries. We implemented the proposed algorithm on GPU and optimized the program through three strategies, including kernel combination, reusing the intermediate data and optimizing data representation. Experimental results show that applying the proposed parallel method supports real-time processing throughput for 1080p at 450fps. We have also observed 3.78× and 16.68× speedup for comprehensive optimization parallel deblocking filter on two-core processor and the state-of-the-art GPU-based implementation, respectively.
ISSN:1555-5798
2154-5952
DOI:10.1109/PACRIM.2011.6032906