CNNs for JPEGs: A Study in Computational Cost
Convolutional neural networks (CNNs) have achieved astonishing advances over the past decade, defining state-of-the-art in several computer vision tasks. CNNs are capable of learning robust representations of the data directly from the RGB pixels. However, most image data are usually available in co...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Convolutional neural networks (CNNs) have achieved astonishing advances over
the past decade, defining state-of-the-art in several computer vision tasks.
CNNs are capable of learning robust representations of the data directly from
the RGB pixels. However, most image data are usually available in compressed
format, from which the JPEG is the most widely used due to transmission and
storage purposes demanding a preliminary decoding process that have a high
computational load and memory usage. For this reason, deep learning methods
capable of learning directly from the compressed domain have been gaining
attention in recent years. Those methods usually extract a frequency domain
representation of the image, like DCT, by a partial decoding, and then make
adaptation to typical CNNs architectures to work with them. One limitation of
these current works is that, in order to accommodate the frequency domain data,
the modifications made to the original model increase significantly their
amount of parameters and computational complexity. On one hand, the methods
have faster preprocessing, since the cost of fully decoding the images is
avoided, but on the other hand, the cost of passing the images though the model
is increased, mitigating the possible upside of accelerating the method. In
this paper, we propose a further study of the computational cost of deep models
designed for the frequency domain, evaluating the cost of decoding and passing
the images through the network. We also propose handcrafted and data-driven
techniques for reducing the computational complexity and the number of
parameters for these models in order to keep them similar to their RGB
baselines, leading to efficient models with a better trade off between
computational cost and accuracy. |
---|---|
DOI: | 10.48550/arxiv.2309.11417 |