End-to-End Learning-Based Image Compression With a Decoupled Framework

The autoregressive model has been widely used in learning-based image compression due to its superior context modeling capability. However, its sequential processing nature also undermines the ability of decoding in parallel and hinders the deployment in real applications. In this paper, we propose...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on circuits and systems for video technology 2024-05, Vol.34 (5), p.3067-3081
Hauptverfasser: Zhang, Zhaobin, Esenlik, Semih, Wu, Yaojun, Wang, Meng, Zhang, Kai, Zhang, Li
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The autoregressive model has been widely used in learning-based image compression due to its superior context modeling capability. However, its sequential processing nature also undermines the ability of decoding in parallel and hinders the deployment in real applications. In this paper, we propose a decoupled framework to resolve this issue. With the decoupled architecture, the entropy decoding process is independent of the latent sample reconstruction process. The entropy decoding process thus can be finished before the latent sample prediction process begins, which leads to significant decoding time savings by enabling the two processes to be conducted in parallel. To further reduce the decoding time, we introduce wavefront processing, where multiple rows can be processed simultaneously when reconstructing the latent samples. On top of that, we design a series of coding tools to improve the rate-distortion efficiency and reduce the decoding complexity. Device interoperability is also supported by the proposed solution, where the same bitstream can be successfully decoded on different CPU/GPU devices. Comprehensive experiments are conducted to validate the effectiveness of the proposed method. Using objective evaluation metrics required by JPEG AI Call for Proposals (CfP), the proposed method achieves a BD-rate change of −29.6% on average with 2.44 times faster decoding speed compared to VVC image coding. When compared to the commonly used benchmark learning-based methods, the proposed method achieves −30.5% BD-rate changes and 101 times faster decoding speed over cheng2020attn. The proposed solution has been proposed to JPEG AI and IEEE 1857.11 as a response to CfP and the core techniques have been adopted by both.
ISSN:1051-8215
1558-2205
DOI:10.1109/TCSVT.2023.3313974