Search | arXiv e-print repository

doi 10.1109/TCSVT.2013.2270397

A Single-Channel Architecture for Algebraic Integer Based 8$\times$8 2-D DCT Computation

Authors: A. Edirisuriya, A. Madanayake, R. J. Cintra, V. S. Dimitrov

Abstract: An area efficient row-parallel architecture is proposed for the real-time implementation of bivariate algebraic integer (AI) encoded 2-D discrete cosine transform (DCT) for image and video processing. The proposed architecture computes 8$\times$8 2-D DCT transform based on the Arai DCT algorithm. An improved fast algorithm for AI based 1-D DCT computation is proposed along with a single channel 2-… ▽ More An area efficient row-parallel architecture is proposed for the real-time implementation of bivariate algebraic integer (AI) encoded 2-D discrete cosine transform (DCT) for image and video processing. The proposed architecture computes 8$\times$8 2-D DCT transform based on the Arai DCT algorithm. An improved fast algorithm for AI based 1-D DCT computation is proposed along with a single channel 2-D DCT architecture. The design improves on the 4-channel AI DCT architecture that was published recently by reducing the number of integer channels to one and the number of 8-point 1-D DCT cores from 5 down to 2. The architecture offers exact computation of 8$\times$8 blocks of the 2-D DCT coefficients up to the FRS, which converts the coefficients from the AI representation to fixed-point format using the method of expansion factors. Prototype circuits corresponding to FRS blocks based on two expansion factors are realized, tested, and verified on FPGA-chip, using a Xilinx Virtex-6 XC6VLX240T device. Post place-and-route results show a 20% reduction in terms of area compared to the 2-D DCT architecture requiring five 1-D AI cores. The area-time and area-time${}^2$ complexity metrics are also reduced by 23% and 22% respectively for designs with 8-bit input word length. The digital realizations are simulated up to place and route for ASICs using 45 nm CMOS standard cells. The maximum estimated clock rate is 951 MHz for the CMOS realizations indicating 7.608$\cdot$10$^9$ pixels/seconds and a 8$\times$8 block rate of 118.875 MHz. △ Less

Submitted 26 October, 2017; originally announced October 2017.

Comments: 8 pages, 6 figures, 5 tables

Journal ref: IEEE Transactions on Circuits and Systems for Video Technology, volume 23, number 12, pages 2083-2089, Dec. 2013

arXiv:1702.01805 [pdf, other]

doi 10.1088/0957-0233/23/11/114010

A Digital Hardware Fast Algorithm and FPGA-based Prototype for a Novel 16-point Approximate DCT for Image Compression Applications

Authors: F. M. Bayer, R. J. Cintra, A. Edirisuriya, A. Madanayake

Abstract: The discrete cosine transform (DCT) is the key step in many image and video coding standards. The 8-point DCT is an important special case, possessing several low-complexity approximations widely investigated. However, 16-point DCT transform has energy compaction advantages. In this sense, this paper presents a new 16-point DCT approximation with null multiplicative complexity. The proposed transf… ▽ More The discrete cosine transform (DCT) is the key step in many image and video coding standards. The 8-point DCT is an important special case, possessing several low-complexity approximations widely investigated. However, 16-point DCT transform has energy compaction advantages. In this sense, this paper presents a new 16-point DCT approximation with null multiplicative complexity. The proposed transform matrix is orthogonal and contains only zeros and ones. The proposed transform outperforms the well-know Walsh-Hadamard transform and the current state-of-the-art 16-point approximation. A fast algorithm for the proposed transform is also introduced. This fast algorithm is experimentally validated using hardware implementations that are physically realized and verified on a 40 nm CMOS Xilinx Virtex-6 XC6VLX240T FPGA chip for a maximum clock rate of 342 MHz. Rapid prototypes on FPGA for 8-bit input word size shows significant improvement in compressed image quality by up to 1-2 dB at the cost of only eight adders compared to the state-of-art 16-point DCT approximation algorithm in the literature [S. Bouguezel, M. O. Ahmad, and M. N. S. Swamy. A novel transform for image compression. In {\em Proceedings of the 53rd IEEE International Midwest Symposium on Circuits and Systems (MWSCAS)}, 2010]. △ Less

Submitted 6 February, 2017; originally announced February 2017.

Comments: 17 pages, 6 figures, 6 tables

Journal ref: Measurement Science and Technology, Volume 23, Number 11, 2012

arXiv:1502.04221 [pdf, ps, other]

doi 10.1109/TCSVT.2011.2181232

A Row-parallel 8$\times$8 2-D DCT Architecture Using Algebraic Integer Based Exact Computation

Authors: A. Madanayake, R. J. Cintra, D. Onen, V. S. Dimitrov, N. T. Rajapaksha, L. T. Bruton, A. Edirisuriya

Abstract: An algebraic integer (AI) based time-multiplexed row-parallel architecture and two final-reconstruction step (FRS) algorithms are proposed for the implementation of bivariate AI-encoded 2-D discrete cosine transform (DCT). The architecture directly realizes an error-free 2-D DCT without using FRSs between row-column transforms, leading to an 8$\times$8 2-D DCT which is entirely free of quantizatio… ▽ More An algebraic integer (AI) based time-multiplexed row-parallel architecture and two final-reconstruction step (FRS) algorithms are proposed for the implementation of bivariate AI-encoded 2-D discrete cosine transform (DCT). The architecture directly realizes an error-free 2-D DCT without using FRSs between row-column transforms, leading to an 8$\times$8 2-D DCT which is entirely free of quantization errors in AI basis. As a result, the user-selectable accuracy for each of the coefficients in the FRS facilitates each of the 64 coefficients to have its precision set independently of others, avoiding the leakage of quantization noise between channels as is the case for published DCT designs. The proposed FRS uses two approaches based on (i) optimized Dempster-Macleod multipliers and (ii) expansion factor scaling. This architecture enables low-noise high-dynamic range applications in digital video processing that requires full control of the finite-precision computation of the 2-D DCT. The proposed architectures and FRS techniques are experimentally verified and validated using hardware implementations that are physically realized and verified on FPGA chip. Six designs, for 4- and 8-bit input word sizes, using the two proposed FRS schemes, have been designed, simulated, physically implemented and measured. The maximum clock rate and block-rate achieved among 8-bit input designs are 307.787 MHz and 38.47 MHz, respectively, implying a pixel rate of 8$\times$307.787$\approx$2.462 GHz if eventually embedded in a real-time video-processing system. The equivalent frame rate is about 1187.35 Hz for the image size of 1920$\times$1080. All implementations are functional on a Xilinx Virtex-6 XC6VLX240T FPGA device. △ Less

Submitted 14 February, 2015; originally announced February 2015.

Comments: 28 pages, 9 figures, 7 tables, corrected typos

Journal ref: IEEE Transactions on Circuits and Systems for Video Technology, vol. 22, no. 6, pp. 915--929, 2012

arXiv:1501.02995 [pdf, ps, other]

doi 10.1109/TCSI.2013.2295022

Improved 8-point Approximate DCT for Image and Video Compression Requiring Only 14 Additions

Authors: U. S. Potluri, A. Madanayake, R. J. Cintra, F. M. Bayer, S. Kulasekera, A. Edirisuriya

Abstract: Video processing systems such as HEVC requiring low energy consumption needed for the multimedia market has lead to extensive development in fast algorithms for the efficient approximation of 2-D DCT transforms. The DCT is employed in a multitude of compression standards due to its remarkable energy compaction properties. Multiplier-free approximate DCT transforms have been proposed that offer sup… ▽ More Video processing systems such as HEVC requiring low energy consumption needed for the multimedia market has lead to extensive development in fast algorithms for the efficient approximation of 2-D DCT transforms. The DCT is employed in a multitude of compression standards due to its remarkable energy compaction properties. Multiplier-free approximate DCT transforms have been proposed that offer superior compression performance at very low circuit complexity. Such approximations can be realized in digital VLSI hardware using additions and subtractions only, leading to significant reductions in chip area and power consumption compared to conventional DCTs and integer transforms. In this paper, we introduce a novel 8-point DCT approximation that requires only 14 addition operations and no multiplications. The proposed transform possesses low computational complexity and is compared to state-of-the-art DCT approximations in terms of both algorithm complexity and peak signal-to-noise ratio. The proposed DCT approximation is a candidate for reconfigurable video standards such as HEVC. The proposed transform and several other DCT approximations are mapped to systolic-array digital architectures and physically realized as digital prototype circuits using FPGA technology and mapped to 45 nm CMOS technology. △ Less

Submitted 13 January, 2015; originally announced January 2015.

Comments: 30 pages, 7 figures, 5 tables

Journal ref: Circuits and Systems I: Regular Papers, IEEE Transactions on, Volume 61, Issue 6, June 2014, 1727--1740

Showing 1–4 of 4 results for author: Edirisuriya, A