The Bj{\o}ntegaard Bible -- Why your Way of Comparing Video Codecs May Be Wrong
In this paper, we provide an in-depth assessment on the Bj{\o}ntegaard Delta. We construct a large data set of video compression performance comparisons using a diverse set of metrics including PSNR, VMAF, bitrate, and processing energies. These metrics are evaluated for visual data types such as cl...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In this paper, we provide an in-depth assessment on the Bj{\o}ntegaard Delta.
We construct a large data set of video compression performance comparisons
using a diverse set of metrics including PSNR, VMAF, bitrate, and processing
energies. These metrics are evaluated for visual data types such as classic
perspective video, 360$^\circ$ video, point clouds, and screen content. As
compression technology, we consider multiple hybrid video codecs as well as
state-of-the-art neural network based compression methods. Using additional
supporting points inbetween standard points defined by parameters such as the
quantization parameter, we assess the interpolation error of the
Bj{\o}ntegaard-Delta (BD) calculus and its impact on the final BD value. From
the analysis, we find that the BD calculus is most accurate in the standard
application of rate-distortion comparisons with mean errors below 0.5
percentage points. For other applications and special cases, e.g., VMAF
quality, energy considerations, or inter-codec comparisons, the errors are
higher (up to 5 percentage points), but can be halved by using a higher number
of supporting points. We finally come up with recommendations on how to use the
BD calculus such that the validity of the resulting BD-values is maximized.
Main recommendations are as follows: First, relative curve differences should
be plotted and analyzed. Second, the logarithmic domain should be used for
saturating metrics such as SSIM and VMAF. Third, BD values below a certain
threshold indicated by the subset error should not be used to draw
recommendations. Fourth, using two supporting points is sufficient to obtain
rough performance estimates. |
---|---|
DOI: | 10.48550/arxiv.2304.12852 |