A Generalized Algorithm and Reconfigurable Architecture for Efficient and Scalable Orthogonal Approximation of DCT
Approximation of discrete cosine transform (DCT) is useful for reducing its computational complexity without significant impact on its coding performance. Most of the existing algorithms for approximation of the DCT target only the DCT of small transform lengths, and some of them are non-orthogonal....
Gespeichert in:
Veröffentlicht in: | IEEE transactions on circuits and systems. I, Regular papers Regular papers, 2015-02, Vol.62 (2), p.449-457 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Approximation of discrete cosine transform (DCT) is useful for reducing its computational complexity without significant impact on its coding performance. Most of the existing algorithms for approximation of the DCT target only the DCT of small transform lengths, and some of them are non-orthogonal. This paper presents a generalized recursive algorithm to obtain orthogonal approximation of DCT where an approximate DCT of length N could be derived from a pair of DCTs of length (N/2) at the cost of N additions for input preprocessing. We perform recursive sparse matrix decomposition and make use of the symmetries of DCT basis vectors for deriving the proposed approximation algorithm. Proposed algorithm is highly scalable for hardware as well as software implementation of DCT of higher lengths, and it can make use of the existing approximation of 8-point DCT to obtain approximate DCT of any power of two length, N > 8. We demonstrate that the proposed approximation of DCT provides comparable or better image and video compression performance than the existing approximation methods. It is shown that proposed algorithm involves lower arithmetic complexity compared with the other existing approximation algorithms. We have presented a fully scalable reconfigurable parallel architecture for the computation of approximate DCT based on the proposed algorithm. One uniquely interesting feature of the proposed design is that it could be configured for the computation of a 32-point DCT or for parallel computation of two 16-point DCTs or four 8-point DCTs with a marginal control overhead. The proposed architecture is found to offer many advantages in terms of hardware complexity, regularity and modularity. Experimental results obtained from FPGA implementation show the advantage of the proposed method. |
---|---|
ISSN: | 1549-8328 1558-0806 |
DOI: | 10.1109/TCSI.2014.2360763 |