Efficient Algorithm Adaptations and Fully Parallel Hardware Architecture of H.265/HEVC Intra Encoder

The growing demand for high-performance ultra-high-definition video coding leads to H.265/high-efficiency video coding (HEVC), where the increased computational complexity and data/timing dependence hinder its coding throughput. To address these challenges, this paper presents four algorithm adaptat...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on circuits and systems for video technology 2019-11, Vol.29 (11), p.3415-3429
Hauptverfasser: Zhang, Yuanzhi, Lu, Chao
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The growing demand for high-performance ultra-high-definition video coding leads to H.265/high-efficiency video coding (HEVC), where the increased computational complexity and data/timing dependence hinder its coding throughput. To address these challenges, this paper presents four algorithm adaptations and a fully parallel hardware architecture for an H.265/HEVC intra encoder. To the best of our knowledge, this is the first fully parallel H.265/HEVC intra encoder. This design supports 35 prediction modes and all coding tree unit partitions. All PUs are independently processed in four prediction engines for high parallelism. An appropriate set of intra prediction modes, RDO candidates, and CABAC rate estimate instances is assigned to each prediction engine, where internal computational tasks are pipelined and scheduled to maximize the processing throughput. Compared with the HM-15.0 software, the proposed algorithm adaptations lead to a reduction of 27% in computational workload, while the average BD-rate and BD-PSNR are 4.39% and -0.21 dB, respectively. This BD-rate is lower than the existing designs with the same video resolution. FPGA implementation of the proposed design shows that it operates at 120 MHz and supports 45 fps of 1080P video sequences using 201-K logic elements and 120-KB on-chip SRAM. ASIC implementation of the proposed design in TSMC 90-nm technology shows that its clock frequency reaches 320 MHz with a hardware gate count of 2288 K, and that it supports real-time encoding of 30 fps of 4-K video sequences. Compared with the state-of-the-art designs, our proposed design demonstrates advantages in computational complexity, bit rate, video quality, throughput, reliability, and flexibility.
ISSN:1051-8215
1558-2205
DOI:10.1109/TCSVT.2018.2878399