Analysis and architecture design of scalable fractional motion estimation for H.264 encoding

Fractional Motion Estimation (FME) is an important part of the H.264/AVC video encoding standard. The algorithm can significantly increase the compression ratio of video encoders while improving video quality. However, it is computationally expensive and can consist of over 45% of the total motion e...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Integration (Amsterdam) 2012-09, Vol.45 (4), p.427-438
Hauptverfasser: Vasiljevic, Jasmina, Ye, Andy
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Fractional Motion Estimation (FME) is an important part of the H.264/AVC video encoding standard. The algorithm can significantly increase the compression ratio of video encoders while improving video quality. However, it is computationally expensive and can consist of over 45% of the total motion estimation runtime. To maximize the performance and utilization of FME implementations on Field-Programmable Gate Arrays (FPGAs), one needs to effectively exploit the inherent parallelism in the algorithm. In this work, we explore two approaches to FME algorithm parallelization in order to effectively increase the processing power of the computing hardware. We call the first method vertical scaling and the second horizontal scaling. We implemented six scaled FME designs on a Xilinx XC5VLX85T (Virtex-5) FPGA. We found that scaling vertically within a 4×4 sub-block is more efficient than scaling horizontally across several sub-blocks. As a result, we were able to achieve higher video resolutions at lower hardware resource cost. In particular, it is shown that the best vertically scaled design can achieve 30fps of QSXGA video with 4 reference frames with only 25.5K LUTS and 28.7K registers. ► Explored Fractional Motion Estimation (FME) algorithm parallelization. ► Implemented six scaled designs on a Xilinx XC5VLX85T (Virtex-5) FPGA. ► Found that scaling vertically within a 4×4 sub-block is more efficient. ► Scaling horizontally across several sub-blocks is less efficient. ► Best vertically scaled design achieves 30fps at QSXGA with 4 reference frames.
ISSN:0167-9260
1872-7522
DOI:10.1016/j.vlsi.2011.11.017