On Parallelizing H.264/AVC Rate-Distortion Optimization Baseline Profile Encoder

A H.264/AVC encoder can incorporate many coding schemes, such as rate-distortion optimization (RDO), into its design to improve its compression performance, dramatically raising computational complexity. With the H.264/AVC RDO encoder, computation time is primarily spent calculating the rate-distort...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of Information Science and Engineering 2010-03, Vol.26 (2), p.409-426
Hauptverfasser: 王景新(Jing-Xin Wang), 邱永昌(Yung-Chang Chiu), 蘇文鈺(Alvin W. Y. Su), 謝錫堃(Ce-Kuen Shieh)
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:A H.264/AVC encoder can incorporate many coding schemes, such as rate-distortion optimization (RDO), into its design to improve its compression performance, dramatically raising computational complexity. With the H.264/AVC RDO encoder, computation time is primarily spent calculating the rate-distortion cost in choosing the optimal coding mode for both inter and intra coding modes. Parallel computation is one of the ways to speed up the encoder. However, calculating rate-distortion costs requires a great amount of reference data obtained from coded adjacent macroblocks in order to maintain the coding efficiency established by the JM encoder. This is an undesirable property for any parallel computing strategy. The transmission of such a large amount of reference data, as well as the frequency of transmission between processing nodes, reduces the speed of the entire encoding process. Thus, it may become necessary to drop part of the reference data and decrease the frequency of transmission in order to reduce the traffic. In the investigation of this problem, this study uses three different parallel schemes for the implementation of the H.264/AVC RDO encoder. These schemes are each run over a software DSM-based (distributed shared memory) PC cluster system consisting of 1 to 5 PC computers (one master node, with or without one to several slave processing nodes). The amount of data to be exchanged among processing nodes is analyzed for each scheme. In addition, the PSNR performance and the number of speedup results are provided for each scheme. Experiments show that considerable reduction in coding gain is expected, as more information is dropped. In lower bit rate cases, performance is reduced to the level of a regular H.264 encoder. Nevertheless, this paper provides a good reference for implementing such an encoder utilizing a cluster computing system.
ISSN:1016-2364
DOI:10.6688/JISE.2010.26.2.6