-
General Framework for Array Noise Analysis and Noise Performance of a Two-Element Interferometer With a Mutual-Coupling Canceler
Authors:
Leonid Belostotski,
Adrian T. Sutinjo,
Ravi Subrahmanyan,
Soumyajit Mandal,
Arjuna Madanayake
Abstract:
This article investigates the noise performance of a two-element phased array and interferometer containing a recently introduced self-interference canceler, which in the context of this work acts as a mutual-coupling canceler. To this end, a general framework is proposed to permit noise analysis of this network and a large variety of other networks. The framework-based numerical analysis for a tw…
▽ More
This article investigates the noise performance of a two-element phased array and interferometer containing a recently introduced self-interference canceler, which in the context of this work acts as a mutual-coupling canceler. To this end, a general framework is proposed to permit noise analysis of this network and a large variety of other networks. The framework-based numerical analysis for a two-element-phased array shows that the addition of the canceler significantly increases the beam-equivalent noise temperature. For a two-element interferometer used in cosmology, this increase in noise temperature is still acceptable as the sky noise temperature in the 20-to-200 MHz band is high. When used in an interferometer, the canceler provides the ability to null mutual coherence at the interferometer output. The ability to provide matching to reduce the sensitivity of the null in mutual coherence to the phase of the 90deg hybrids in the canceler is discussed.
△ Less
Submitted 1 April, 2025;
originally announced April 2025.
-
A Low-complexity Structured Neural Network Approach to Intelligently Realize Wideband Multi-beam Beamformers
Authors:
Hansaka Aluvihare,
Sivakumar Sivasankar,
Xianqi Li,
Arjuna Madanayake,
Sirani M. Perera
Abstract:
True-time-delay (TTD) beamformers can produce wideband, squint-free beams in both analog and digital signal domains, unlike frequency-dependent FFT beams. Our previous work showed that TTD beamformers can be efficiently realized using the elements of delay Vandermonde matrix (DVM), answering the longstanding beam-squint problem. Thus, building on our work on classical algorithms based on DVM, we p…
▽ More
True-time-delay (TTD) beamformers can produce wideband, squint-free beams in both analog and digital signal domains, unlike frequency-dependent FFT beams. Our previous work showed that TTD beamformers can be efficiently realized using the elements of delay Vandermonde matrix (DVM), answering the longstanding beam-squint problem. Thus, building on our work on classical algorithms based on DVM, we propose neural network (NN) architecture to realize wideband multi-beam beamformers using structure-imposed weight matrices and submatrices. The structure and sparsity of the weight matrices and submatrices are shown to reduce the space and computational complexities of the NN greatly. The proposed network architecture has O(pLM logM) complexity compared to a conventional fully connected L-layers network with O(M2L) complexity, where M is the number of nodes in each layer of the network, p is the number of submatrices per layer, and M >> p. We will show numerical simulations in the 24 GHz to 32 GHz range to demonstrate the numerical feasibility of realizing wideband multi-beam beamformers using the proposed neural architecture. We also show the complexity reduction of the proposed NN and compare that with fully connected NNs, to show the efficiency of the proposed architecture without sacrificing accuracy. The accuracy of the proposed NN architecture was shown using the mean squared error, which is based on an objective function of the weight matrices and beamformed signals of antenna arrays, while also normalizing nodes. The proposed NN architecture shows a low-complexity NN realizing wideband multi-beam beamformers in real-time for low-complexity intelligent systems.
△ Less
Submitted 26 March, 2025;
originally announced March 2025.
-
Fast Data-independent KLT Approximations Based on Integer Functions
Authors:
A. P. Radünz,
D. F. G. Coelho,
F. M. Bayer,
R. J. Cintra,
A. Madanayake
Abstract:
The Karhunen-Loève transform (KLT) stands as a well-established discrete transform, demonstrating optimal characteristics in data decorrelation and dimensionality reduction. Its ability to condense energy compression into a select few main components has rendered it instrumental in various applications within image compression frameworks. However, computing the KLT depends on the covariance matrix…
▽ More
The Karhunen-Loève transform (KLT) stands as a well-established discrete transform, demonstrating optimal characteristics in data decorrelation and dimensionality reduction. Its ability to condense energy compression into a select few main components has rendered it instrumental in various applications within image compression frameworks. However, computing the KLT depends on the covariance matrix of the input data, which makes it difficult to develop fast algorithms for its implementation. Approximations for the KLT, utilizing specific rounding functions, have been introduced to reduce its computational complexity. Therefore, our paper introduces a category of low-complexity, data-independent KLT approximations, employing a range of round-off functions. The design methodology of the approximate transform is defined for any block-length $N$, but emphasis is given to transforms of $N = 8$ due to its wide use in image and video compression. The proposed transforms perform well when compared to the exact KLT and approximations considering classical performance measures. For particular scenarios, our proposed transforms demonstrated superior performance when compared to KLT approximations documented in the literature. We also developed fast algorithms for the proposed transforms, further reducing the arithmetic cost associated with their implementation. Evaluation of field programmable gate array (FPGA) hardware implementation metrics was conducted. Practical applications in image encoding showed the relevance of the proposed transforms. In fact, we showed that one of the proposed transforms outperformed the exact KLT given certain compression ratios.
△ Less
Submitted 11 October, 2024;
originally announced October 2024.
-
Revealing Invisible Scattering Poles via Complex Frequency Excitations
Authors:
Deepanshu Trivedi,
Arjuna Madanayake,
Alex Krasnok
Abstract:
Recent research in light scattering has prompted a re-evaluation of complex quantities, particularly in the context of complex frequency signals, which exhibit exponential growth or decay unlike traditional harmonic signals. We introduce a novel approach using complex frequency signals to reveal hidden or invisible poles--those with predominantly imaginary components--previously undetected in conv…
▽ More
Recent research in light scattering has prompted a re-evaluation of complex quantities, particularly in the context of complex frequency signals, which exhibit exponential growth or decay unlike traditional harmonic signals. We introduce a novel approach using complex frequency signals to reveal hidden or invisible poles--those with predominantly imaginary components--previously undetected in conventional scattering experiments. By employing a carefully tuned complex frequency excitation method, we demonstrate the efficient conversion of non-oscillating fields into oscillating ones. This effect is shown in both RF and optical domains, specifically within the C-band infrared spectral range, which is crucial for communications. This study enhances the theoretical framework of wave interactions in photonic systems, paving the way for innovative applications in invisibility cloaking, advanced photonic devices, and the future of optical communication and quantum computing.
△ Less
Submitted 18 August, 2024;
originally announced August 2024.
-
Anomalies in Light Scattering: A Circuit Model Approach
Authors:
Deepanshu Trivedi,
Arjuna Madanayake,
Alex Krasnok
Abstract:
In experimental physics, it is essential to understand electromagnetic (EM) wave scattering across EM spectrum, from radio waves to X-rays, and is pivotal in driving photonics innovations. Recent advancements have uncovered phenomena like bound states in the continuum (BICs) and parity-time (PT) symmetric systems, which are closely associated with the characteristics of the scattering matrix and a…
▽ More
In experimental physics, it is essential to understand electromagnetic (EM) wave scattering across EM spectrum, from radio waves to X-rays, and is pivotal in driving photonics innovations. Recent advancements have uncovered phenomena like bound states in the continuum (BICs) and parity-time (PT) symmetric systems, which are closely associated with the characteristics of the scattering matrix and are governed by passivity and causality. The emergence of complex frequency excitations has transcended the constraints imposed by passivity and causality in a system, revealing effects such as virtual critical coupling and virtual gain. However, applying the concepts of complex frequency excitation in more complicated systems remains challenging. In this work, we demonstrate the extension of the lumped element model of circuit theory to the analysis of anomalies in light scattering in the complex frequency domain. We demonstrate that the circuit model approach can facilitate design and analysis of effects such as virtual perfect absorption, BICs, real and virtual critical coupling, exceptional points, and anisotropic transmission resonances (ATRs). These findings broaden comprehension of EM wave phenomena and pave the way for significant advancements in photonics, offering new methods for designing and optimizing optical devices and systems with broad-ranging applications.
△ Less
Submitted 21 February, 2024;
originally announced February 2024.
-
Cellular Wireless Networks in the Upper Mid-Band
Authors:
Seongjoon Kang,
Marco Mezzavilla,
Sundeep Rangan,
Arjuna Madanayake,
Satheesh Bojja Venkatakrishnan,
Gregory Hellbourg,
Monisha Ghosh,
Hamed Rahmani,
Aditya Dhananjay
Abstract:
The upper mid-band - roughly from 7 to 24 GHz - has attracted considerable recent interest for new cellular services. This frequency range has vastly more spectrum than the highly congested bands below 7 GHz while offering more favorable propagation and coverage than the millimeter wave (mmWave) frequencies. The upper mid-band can thus provide a powerful and complementary frequency range to balanc…
▽ More
The upper mid-band - roughly from 7 to 24 GHz - has attracted considerable recent interest for new cellular services. This frequency range has vastly more spectrum than the highly congested bands below 7 GHz while offering more favorable propagation and coverage than the millimeter wave (mmWave) frequencies. The upper mid-band can thus provide a powerful and complementary frequency range to balance coverage and capacity. Realizing the full potential of these bands, however, will require fundamental changes to the design of cellular systems. Most importantly, spectrum will likely need to be shared with incumbents including communication satellites, military RADAR, and radio astronomy. Also, the upper mid-band is simply a vast frequency range. Due to this wide bandwidth, combined with the directional nature of transmission and intermittent occupancy of incumbents, cellular systems will need to be agile to sense and intelligently use large spatial and frequency degrees of freedom. This paper attempts to provide an initial assessment of the feasibility and potential gains of wideband cellular systems operating in the upper mid-band. The study includes: (1) a system study to assess potential gains of multi-band systems in a representative dense urban environment and illustrate the value of wide band system with dynamic frequency selectivity; (2) an evaluation of potential cross interference between satellites and terrestrial cellular services and interference nulling to reduce that interference; and (3) design and evaluation of a compact multi-band antenna array structure. Leveraging these preliminary results, we identify potential future research directions to realize next-generation systems in these frequencies.
△ Less
Submitted 6 March, 2024; v1 submitted 6 September, 2023;
originally announced September 2023.
-
Fano-Qubits for Quantum Devices with Enhanced Isolation and Bandwidth
Authors:
Deepanshu Trivedi,
Leonid Belostotski,
Arjuna Madanayake,
Alex Krasnok
Abstract:
Magneto-optical isolators and circulators have been widely used to safeguard quantum devices from reflections and noise in the readout stage. However, these devices have limited bandwidth, low tunability, are bulky, and suffer from high losses, making them incompatible with planar technologies such as circuit QED. To address these limitations, we propose a new approach to quantum non-reciprocity t…
▽ More
Magneto-optical isolators and circulators have been widely used to safeguard quantum devices from reflections and noise in the readout stage. However, these devices have limited bandwidth, low tunability, are bulky, and suffer from high losses, making them incompatible with planar technologies such as circuit QED. To address these limitations, we propose a new approach to quantum non-reciprocity that utilizes the intrinsic nonlinearity of qubits and broken spatial symmetry. We show that a circuit containing Lorentz-type qubits can be transformed into Fano-type qubits with an asymmetric spectral response, resulting in a significant improvement in isolation (up to 40 dB) and a twofold increase in spectral bandwidth (up to 200 MHz). Our analysis is based on realistic circuit parameters, validated by existing experimental results, and supported by rigorous quantum simulations. This approach could enable the development of compact, high-performance, and planar-compatible non-reciprocal quantum devices with potential applications in quantum computing, communication, and sensing.
△ Less
Submitted 17 March, 2023;
originally announced March 2023.
-
Vision Transformer with Convolutional Encoder-Decoder for Hand Gesture Recognition using 24 GHz Doppler Radar
Authors:
Kavinda Kehelella,
Gayangana Leelarathne,
Dhanuka Marasinghe,
Nisal Kariyawasam,
Viduneth Ariyarathna,
Arjuna Madanayake,
Ranga Rodrigo,
Chamira U. S. Edussooriya
Abstract:
Transformers combined with convolutional encoders have been recently used for hand gesture recognition (HGR) using micro-Doppler signatures. We propose a vision-transformer-based architecture for HGR with multi-antenna continuous-wave Doppler radar receivers. The proposed architecture consists of three modules: a convolutional encoderdecoder, an attention module with three transformer layers, and…
▽ More
Transformers combined with convolutional encoders have been recently used for hand gesture recognition (HGR) using micro-Doppler signatures. We propose a vision-transformer-based architecture for HGR with multi-antenna continuous-wave Doppler radar receivers. The proposed architecture consists of three modules: a convolutional encoderdecoder, an attention module with three transformer layers, and a multi-layer perceptron. The novel convolutional decoder helps to feed patches with larger sizes to the attention module for improved feature extraction. Experimental results obtained with a dataset corresponding to a two-antenna continuous-wave Doppler radar receiver operating at 24 GHz (published by Skaria et al.) confirm that the proposed architecture achieves an accuracy of 98.3% which substantially surpasses the state-of-the-art on the used dataset.
△ Less
Submitted 12 September, 2022;
originally announced September 2022.
-
Low-Complexity Loeffler DCT Approximations for Image and Video Coding
Authors:
D. F. G. Coelho,
R. J. Cintra,
F. M. Bayer,
S. Kulasekera,
A. Madanayake,
P. A. C. Martinez,
T. L. T. Silveira,
R. S. Oliveira,
V. S. Dimitrov
Abstract:
This paper introduced a matrix parametrization method based on the Loeffler discrete cosine transform (DCT) algorithm. As a result, a new class of eight-point DCT approximations was proposed, capable of unifying the mathematical formalism of several eight-point DCT approximations archived in the literature. Pareto-efficient DCT approximations are obtained through multicriteria optimization, where…
▽ More
This paper introduced a matrix parametrization method based on the Loeffler discrete cosine transform (DCT) algorithm. As a result, a new class of eight-point DCT approximations was proposed, capable of unifying the mathematical formalism of several eight-point DCT approximations archived in the literature. Pareto-efficient DCT approximations are obtained through multicriteria optimization, where computational complexity, proximity, and coding performance are considered. Efficient approximations and their scaled 16- and 32-point versions are embedded into image and video encoders, including a JPEG-like codec and H.264/AVC and H.265/HEVC standards. Results are compared to the unmodified standard codecs. Efficient approximations are mapped and implemented on a Xilinx VLX240T FPGA and evaluated for area, speed, and power consumption.
△ Less
Submitted 28 July, 2022;
originally announced July 2022.
-
Towards a Low-SWaP 1024-beam Digital Array: A 32-beam Sub-system at 5.8 GHz
Authors:
Arjuna Madanayake,
Viduneth Ariyarathna,
Suresh Madishetty,
Sravan Pulipati,
R. J. Cintra,
Diego Coelho,
Raíza Oliveira,
Fábio M. Bayer,
Leonid Belostotski,
Soumyajit Mandal,
Theodore S. Rappaport
Abstract:
Millimeter wave communications require multibeam beamforming in order to utilize wireless channels that suffer from obstructions, path loss, and multi-path effects. Digital multibeam beamforming has maximum degrees of freedom compared to analog phased arrays. However, circuit complexity and power consumption are important constraints for digital multibeam systems. A low-complexity digital computin…
▽ More
Millimeter wave communications require multibeam beamforming in order to utilize wireless channels that suffer from obstructions, path loss, and multi-path effects. Digital multibeam beamforming has maximum degrees of freedom compared to analog phased arrays. However, circuit complexity and power consumption are important constraints for digital multibeam systems. A low-complexity digital computing architecture is proposed for a multiplication-free 32-point linear transform that approximates multiple simultaneous RF beams similar to a discrete Fourier transform (DFT). Arithmetic complexity due to multiplication is reduced from the FFT complexity of $\mathcal{O}(N\: \log N)$ for DFT realizations, down to zero, thus yielding a 46% and 55% reduction in chip area and dynamic power consumption, respectively, for the $N=32$ case considered. The paper describes the proposed 32-point DFT approximation targeting a 1024-beams using a 2D array, and shows the multiplierless approximation and its mapping to a 32-beam sub-system consisting of 5.8 GHz antennas that can be used for generating 1024 digital beams without multiplications. Real-time beam computation is achieved using a Xilinx FPGA at 120 MHz bandwidth per beam. Theoretical beam performance is compared with measured RF patterns from both a fixed-point FFT as well as the proposed multiplier-free algorithm and are in good agreement.
△ Less
Submitted 29 May, 2024; v1 submitted 18 July, 2022;
originally announced July 2022.
-
Fast Radix-32 Approximate DFTs for 1024-Beam Digital RF Beamforming
Authors:
A. Madanayake,
R. J. Cintra,
N. Akram,
V. Ariyarathna,
S. Mandal,
V. A. Coutinho,
F. M. Bayer,
D. Coelho,
T. S. Rappaport
Abstract:
The discrete Fourier transform (DFT) is widely employed for multi-beam digital beamforming. The DFT can be efficiently implemented through the use of fast Fourier transform (FFT) algorithms, thus reducing chip area, power consumption, processing time, and consumption of other hardware resources. This paper proposes three new hybrid DFT 1024-point DFT approximations and their respective fast algori…
▽ More
The discrete Fourier transform (DFT) is widely employed for multi-beam digital beamforming. The DFT can be efficiently implemented through the use of fast Fourier transform (FFT) algorithms, thus reducing chip area, power consumption, processing time, and consumption of other hardware resources. This paper proposes three new hybrid DFT 1024-point DFT approximations and their respective fast algorithms. These approximate DFT (ADFT) algorithms have significantly reduced circuit complexity and power consumption compared to traditional FFT approaches while trading off a subtle loss in computational precision which is acceptable for digital beamforming applications in RF antenna implementations. ADFT algorithms have not been introduced for beamforming beyond $N = 32$, but this paper anticipates the need for massively large adaptive arrays for future 5G and 6G systems. Digital CMOS circuit designs for the ADFTs show the resulting improvements in both circuit complexity and power consumption metrics. Simulation results show similar or lower critical path delay with up to 48.5% lower chip area compared to a standard Cooley-Tukey FFT. The time-area and dynamic power metrics are reduced up to 66.0%. The 1024-point ADFT beamformers produce signal-to-noise ratio (SNR) gains between 29.2--30.1 dB, which is a loss of $\le$ 0.9 dB SNR gain compared to exact 1024-point DFT beamformers (worst case) realizable at using an FFT.
△ Less
Submitted 12 July, 2022;
originally announced July 2022.
-
Block-Parallel Systolic-Array Architecture for 2-D NTT-based Fragile Watermark Embedding
Authors:
H. P. L. Arjuna Madanayake,
R. J. Cintra,
V. S. Dimitrov,
L. Bruton
Abstract:
Number-theoretic transforms (NTTs) have been applied in the fragile watermarking of digital images. A block-parallel systolic-array architecture is proposed for watermarking based on the 2-D special Hartley NTT (HNTT). The proposed core employs two 2-D special HNTT hardware cores, each using digital arithmetic over $\mathrm{GF}(3)$, and processes $4\times4$ blocks of pixels in parallel every clock…
▽ More
Number-theoretic transforms (NTTs) have been applied in the fragile watermarking of digital images. A block-parallel systolic-array architecture is proposed for watermarking based on the 2-D special Hartley NTT (HNTT). The proposed core employs two 2-D special HNTT hardware cores, each using digital arithmetic over $\mathrm{GF}(3)$, and processes $4\times4$ blocks of pixels in parallel every clock cycle. Prototypes are operational on a Xilinx Sx35-10ff668 FPGA device. The maximum estimated throughput of the FPGA circuit is 100 million $4\times4$ HNTT fragile watermarked blocks per second, when clocked at 100 MHz. Potential applications exist in high-traffic back-end servers dealing with large amounts of protected digital images requiring authentication, in remote-sensing for high-security surveillance applications, in real-time video processing of information of a sensitive nature or matters of national security, in video/photographic content management of corporate clients, in authenticating multimedia for the entertainment industry, in the authentication of electronic evidence material, and in real-time news streaming.
△ Less
Submitted 2 June, 2022;
originally announced June 2022.
-
Radix-2 Self-Recursive Sparse Factorizations of Delay Vandermonde Matrices for Wideband Multi-Beam Antenna Arrays
Authors:
S. M. Perera,
A. Madanayake,
R. J. Cintra
Abstract:
This paper presents a self-contained factorization for the Vandermonde matrices associated with true-time delay based wideband analog multi-beam beamforming using antenna arrays. The proposed factorization contains sparse and orthogonal matrices. Novel self-recursive radix-2 algorithms for Vandermonde matrices associated with true time delay based delay-sum filterbanks are presented to reduce the…
▽ More
This paper presents a self-contained factorization for the Vandermonde matrices associated with true-time delay based wideband analog multi-beam beamforming using antenna arrays. The proposed factorization contains sparse and orthogonal matrices. Novel self-recursive radix-2 algorithms for Vandermonde matrices associated with true time delay based delay-sum filterbanks are presented to reduce the circuit complexity of multi-beam analog beamforming systems. The proposed algorithms for Vandermonde matrices by a vector attain $\mathcal{O}(N \log N)$ delay-amplifier circuit counts. Error bounds for the Vandermode matrices associated with true-time delay are established and then analyzed for numerical stability. The potential for real-world circuit implementation of the proposed algorithms will be shown through signal flow graphs that are the starting point for high-frequency analog circuit realizations.
△ Less
Submitted 1 June, 2022;
originally announced June 2022.
-
Efficient and Self-Recursive Delay Vandermonde Algorithm for Multi-Beam Antenna Arrays
Authors:
S. M. Perera,
A. Madanayake,
R. J. Cintra
Abstract:
This paper presents a self-contained factorization for the delay Vandermonde matrix (DVM), which is the super class of the discrete Fourier transform, using sparse and companion matrices. An efficient DVM algorithm is proposed to reduce the complexity of radio-frequency (RF) $N$-beam analog beamforming systems. There exist applications for wideband multi-beam beamformers in wireless communication…
▽ More
This paper presents a self-contained factorization for the delay Vandermonde matrix (DVM), which is the super class of the discrete Fourier transform, using sparse and companion matrices. An efficient DVM algorithm is proposed to reduce the complexity of radio-frequency (RF) $N$-beam analog beamforming systems. There exist applications for wideband multi-beam beamformers in wireless communication networks such as 5G/6G systems, system capacity can be improved by exploiting the improvement of the signal to noise ratio (SNR) using coherent summation of propagating waves based on their directions of propagation. The presence of a multitude of RF beams allows multiple independent wireless links to be established at high SNR, or used in conjunction with multiple-input multiple-output (MIMO) wireless systems, with the overall goal of improving system SNR and therefore capacity. To realize such multi-beam beamformers at acceptable analog circuit complexities, we use sparse factorization of the DVM in order to derive a low arithmetic complexity DVM algorithm. The paper also establishes an error bound and stability analysis of the proposed DVM algorithm. The proposed efficient DVM algorithm is aimed at implementation using analog realizations. For purposes of evaluation, the algorithm can be realized using both digital hardware as well as software defined radio platforms.
△ Less
Submitted 1 June, 2022;
originally announced June 2022.
-
Orbital Angular Momentum (OAM) Carrying Vortex Wave generation in Dielectric Filled Circular Waveguide
Authors:
Md Khadimul Islam,
Arjuna Madanayake,
Shubhendu Bhardwaj
Abstract:
In this paper, we propose a method to generate Orbital Angular Momentum (OAM) carrying vortex waves inside a metallic circular waveguide (CW). These waves feature ability to carry multiple orthogonal modes at the same frequency, by the virtue of their unique spatial structure. In essence high data rate channels can be developed using such waves. In free space, OAM carrying vortex waves has beam di…
▽ More
In this paper, we propose a method to generate Orbital Angular Momentum (OAM) carrying vortex waves inside a metallic circular waveguide (CW). These waves feature ability to carry multiple orthogonal modes at the same frequency, by the virtue of their unique spatial structure. In essence high data rate channels can be developed using such waves. In free space, OAM carrying vortex waves has beam divergence issues and a central NULL, which makes the waves unfavourable for free space communication. But, OAM modes in guided structures do not suffer from these drawbacks. This prospect of enhancement of communication spectrum provides the background for the study of vortex wave in the circular waveguides. In this work, a radial array of monopoles is designed to generate the vortex wave inside the waveguide. Further, we introduced the dielectric materials inside the waveguide in order to manipulate the operating frequency of the OAM modes. Simulation results shows that the various dielectric materials allow us to tune the working frequency of the OAM beam to a desired frequency.
△ Less
Submitted 28 September, 2021;
originally announced September 2021.
-
Design of Maximum-Gain Dielectric Lens Antenna via Phase Center Analysis
Authors:
Md Khadimul Islam,
Arjuna Madanayake,
Shubhendu Bhardwaj
Abstract:
In this work, a method is presented to maximize the obtained gain from millimeter-wave (mm-wave) lens antennas using phase center analysis. Commonly, for designing a lens antenna, the lens is positioned just on top of the antenna element which is not capable of providing the maximum gain/aperture efficiency. A novel solution method is proposed where the lens will be placed at a distance calculated…
▽ More
In this work, a method is presented to maximize the obtained gain from millimeter-wave (mm-wave) lens antennas using phase center analysis. Commonly, for designing a lens antenna, the lens is positioned just on top of the antenna element which is not capable of providing the maximum gain/aperture efficiency. A novel solution method is proposed where the lens will be placed at a distance calculated using phase center analysis to produce the maximum gain from the system. A mm-wave microstrip antenna array is designed and the proposed method is applied for the gain enhancement. Simulation results suggest that the propose scheme obtains around 25% gain enhancement compared to the traditional method.
△ Less
Submitted 28 September, 2021;
originally announced September 2021.
-
Low-complexity Scaling Methods for DCT-II Approximations
Authors:
D. F. G. Coelho,
R. J. Cintra,
A. Madanayake,
S. Perera
Abstract:
This paper introduces a collection of scaling methods for generating $2N$-point DCT-II approximations based on $N$-point low-complexity transformations. Such scaling is based on the Hou recursive matrix factorization of the exact $2N$-point DCT-II matrix. Encompassing the widely employed Jridi-Alfalou-Meher scaling method, the proposed techniques are shown to produce DCT-II approximations that out…
▽ More
This paper introduces a collection of scaling methods for generating $2N$-point DCT-II approximations based on $N$-point low-complexity transformations. Such scaling is based on the Hou recursive matrix factorization of the exact $2N$-point DCT-II matrix. Encompassing the widely employed Jridi-Alfalou-Meher scaling method, the proposed techniques are shown to produce DCT-II approximations that outperform the transforms resulting from the JAM scaling method according to total error energy and mean squared error. Orthogonality conditions are derived and an extensive error analysis based on statistical simulation demonstrates the good performance of the introduced scaling methods. A hardware implementation is also provided demonstrating the competitiveness of the proposed methods when compared to the JAM scaling method.
△ Less
Submitted 11 February, 2024; v1 submitted 4 August, 2021;
originally announced August 2021.
-
Millimeter-Wave Antenna Array Diagnosis with Partial Channel State Information
Authors:
George Medina,
Akashdeep Singh Jida,
Sravan Pulipati,
Rohith Talwar,
Nancy Amala J,
Tareq Y. Al-Naffouri,
Arjuna Madanayake,
Mohammed Eltayeb
Abstract:
Large antenna arrays enable directional precoding for Millimeter-Wave (mmWave) systems and provide sufficient link budget to combat the high path-loss at these frequencies. Due to atmospheric conditions and hardware malfunction, outdoor mmWave antenna arrays are prone to blockages or complete failures. This results in a modified array geometry, distorted far-field radiation pattern, and system per…
▽ More
Large antenna arrays enable directional precoding for Millimeter-Wave (mmWave) systems and provide sufficient link budget to combat the high path-loss at these frequencies. Due to atmospheric conditions and hardware malfunction, outdoor mmWave antenna arrays are prone to blockages or complete failures. This results in a modified array geometry, distorted far-field radiation pattern, and system performance degradation. Recent remote array diagnostic techniques have emerged as an effective way to detect defective antenna elements in an array with few diagnostic measurements. These techniques, however, require full and perfect channel state information (CSI), which can be challenging to acquire in the presence of antenna faults. This paper proposes a new remote array diagnosis technique that relaxes the need for full CSI and only requires knowledge of the incident angle-of-arrivals, i.e. partial channel knowledge. Numerical results demonstrate the effectiveness of the proposed technique and show that fault detection can be obtained with comparable number of diagnostic measurements required by diagnostic techniques based on full channel knowledge. In presence of channel estimation errors, the proposed technique is shown to out-perform recently proposed array diagnostic techniques.
△ Less
Submitted 2 November, 2020;
originally announced November 2020.
-
Xilinx RF-SoC-based Digital Multi-Beam Array Processors for 28/60~GHz Wireless Testbeds
Authors:
Sravan Pulipati,
Viduneth Ariyarathna,
Aditya Dhananjay,
Mohammed E. Eltayeb,
Marco Mezzavilla,
Josep M. Jornet,
Soumyajit Mandal,
Shubhendu Bhardwaj,
Arjuna Madanayake
Abstract:
Emerging wireless applications such as 5G cellular, large intelligent surfaces (LIS), and holographic massive MIMO require antenna array processing at mm-wave frequencies with large numbers of independent digital transceivers. This paper summarizes the authors' recent progress on the design and testing of 28 GHz and 60 GHz fully-digital array processing platforms based on wideband reconfigurable F…
▽ More
Emerging wireless applications such as 5G cellular, large intelligent surfaces (LIS), and holographic massive MIMO require antenna array processing at mm-wave frequencies with large numbers of independent digital transceivers. This paper summarizes the authors' recent progress on the design and testing of 28 GHz and 60 GHz fully-digital array processing platforms based on wideband reconfigurable FPGA-based software-defined radios (SDRs). The digital baseband and microwave interfacing aspects of the SDRs are implemented on single-chip RF system-on-chip (RF-SoC) processors from Xilinx. Two versions of the RF-SoC technology (ZCU-111 and ZCU-1275) were used to implement fully-digital real-time array processors at 28~GHz (realizing 4 parallel beams with 0.8 GHz bandwidth per beam) and 60~GHz (realizing 4 parallel beams with 1.8~GHz bandwidth per beam). Dielectric lenslet arrays fed by a digital phased-array feed (PAF) located on the focal plane are proposed for further increasing antenna array gain.
△ Less
Submitted 3 August, 2020;
originally announced August 2020.
-
A Passive STAR Microwave Circuit for 1-3 GHz Self-Interference Cancellation
Authors:
Udara De Silva,
Sravan Pulipati,
Satheesh Bojja Venkatakrishnan,
Shubhendu Bhardwaj,
Arjuna Madanayake
Abstract:
Simultaneous transmit and receive (STAR) allows full-duplex operation of a radio, which leads to doubled capacity for a given bandwidth. A circulator with high-isolation between transmit and receive ports, and low-loss from the antenna to receive port is typically required for achieving STAR. Conventional circulators do not offer wideband performance. Although wideband circulators have been propos…
▽ More
Simultaneous transmit and receive (STAR) allows full-duplex operation of a radio, which leads to doubled capacity for a given bandwidth. A circulator with high-isolation between transmit and receive ports, and low-loss from the antenna to receive port is typically required for achieving STAR. Conventional circulators do not offer wideband performance. Although wideband circulators have been proposed using parametric, switched delay-line/capacitor, and N-path filter techniques using custom integrated circuits, these magnet-free devices have non-linearity, noise, aliasing, and switching noise injection issues. In this paper, a STAR front-end based on passive linear microwave circuit is proposed. Here, a dummy antenna located inside a miniature RF-silent absorption chamber allows circulator-free STAR using simple COTS components. The proposed approach is highly-linear, free from noise, does not require switching or parametric modulation circuits, and has virtually unlimited bandwidth only set by the performance of COTS passive microwave components. The trade-off is relatively large size of the miniature RF-shielded chamber, making this suitable for base-station side applications. Preliminary results show the measured performance of Tx/Rx isolation between 25-60 dB in the 1.0-3.0 GHz range, and 50-60 dB for the 2.4-2.7 GHz range.
△ Less
Submitted 17 August, 2020; v1 submitted 3 August, 2020;
originally announced August 2020.
-
A Direct- Conversion Digital Beamforming Array Receiver with 800 MHz Channel Bandwidth at 28 GHz using Xilinx RF SoC
Authors:
Sravan Pulipati,
Viduneth Ariyarathna,
Udara De Silva,
Najath Akram,
Elias Alwan,
Arjuna Madanayake,
Soumyajit Mandal,
Theodore S. Rappaport
Abstract:
This paper discusses early results associated with a fully-digital direct-conversion array receiver at 28~GHz. The proposed receiver makes use of commercial off-the-shelf (COTS) electronics, including the receiver chain. The design consists of a custom 28~GHz patch antenna sub-array providing gain in the elevation plane, with azimuthal plane beamforming provided by real-time digital signal process…
▽ More
This paper discusses early results associated with a fully-digital direct-conversion array receiver at 28~GHz. The proposed receiver makes use of commercial off-the-shelf (COTS) electronics, including the receiver chain. The design consists of a custom 28~GHz patch antenna sub-array providing gain in the elevation plane, with azimuthal plane beamforming provided by real-time digital signal processing (DSP) algorithms running on a Xilinx Radio Frequency System on Chip (RF SoC). The proposed array receiver employs element-wise fully-digital array processing that supports ADC sample rates up to 2~GS/second and up to 1~GHz of operating bandwidth per antenna. The RF mixed-signal data conversion circuits and DSP algorithms operate on a single-chip RF SoC solution installed on the Xilinx ZCU1275 prototyping platform.
△ Less
Submitted 20 November, 2019;
originally announced November 2019.
-
Low-complexity 8-point DCT Approximation Based on Angle Similarity for Image and Video Coding
Authors:
R. S. Oliveira,
R. J. Cintra,
F. M. Bayer,
T. L. T. da Silveira,
A. Madanayake,
A. Leite
Abstract:
The principal component analysis (PCA) is widely used for data decorrelation and dimensionality reduction. However, the use of PCA may be impractical in real-time applications, or in situations were energy and computing constraints are severe. In this context, the discrete cosine transform (DCT) becomes a low-cost alternative to data decorrelation. This paper presents a method to derive computatio…
▽ More
The principal component analysis (PCA) is widely used for data decorrelation and dimensionality reduction. However, the use of PCA may be impractical in real-time applications, or in situations were energy and computing constraints are severe. In this context, the discrete cosine transform (DCT) becomes a low-cost alternative to data decorrelation. This paper presents a method to derive computationally efficient approximations to the DCT. The proposed method aims at the minimization of the angle between the rows of the exact DCT matrix and the rows of the approximated transformation matrix. The resulting transformations matrices are orthogonal and have extremely low arithmetic complexity. Considering popular performance measures, one of the proposed transformation matrices outperforms the best competitors in both matrix error and coding capabilities. Practical applications in image and video coding demonstrate the relevance of the proposed transformation. In fact, we show that the proposed approximate DCT can outperform the exact DCT for image encoding under certain compression ratios. The proposed transform and its direct competitors are also physically realized as digital prototype circuits using FPGA technology.
△ Less
Submitted 30 January, 2024; v1 submitted 8 August, 2018;
originally announced August 2018.
-
VLSI Computational Architectures for the Arithmetic Cosine Transform
Authors:
N. Rajapaksha,
A. Madanayake,
R. J. Cintra,
J. Adikari,
V. S. Dimitrov
Abstract:
The discrete cosine transform (DCT) is a widely-used and important signal processing tool employed in a plethora of applications. Typical fast algorithms for nearly-exact computation of DCT require floating point arithmetic, are multiplier intensive, and accumulate round-off errors. Recently proposed fast algorithm arithmetic cosine transform (ACT) calculates the DCT exactly using only additions a…
▽ More
The discrete cosine transform (DCT) is a widely-used and important signal processing tool employed in a plethora of applications. Typical fast algorithms for nearly-exact computation of DCT require floating point arithmetic, are multiplier intensive, and accumulate round-off errors. Recently proposed fast algorithm arithmetic cosine transform (ACT) calculates the DCT exactly using only additions and integer constant multiplications, with very low area complexity, for null mean input sequences. The ACT can also be computed non-exactly for any input sequence, with low area complexity and low power consumption, utilizing the novel architecture described. However, as a trade-off, the ACT algorithm requires 10 non-uniformly sampled data points to calculate the 8-point DCT. This requirement can easily be satisfied for applications dealing with spatial signals such as image sensors and biomedical sensor arrays, by placing sensor elements in a non-uniform grid. In this work, a hardware architecture for the computation of the null mean ACT is proposed, followed by a novel architectures that extend the ACT for non-null mean signals. All circuits are physically implemented and tested using the Xilinx XC6VLX240T FPGA device and synthesized for 45 nm TSMC standard-cell library for performance assessment.
△ Less
Submitted 30 October, 2017;
originally announced October 2017.
-
A Single-Channel Architecture for Algebraic Integer Based 8$\times$8 2-D DCT Computation
Authors:
A. Edirisuriya,
A. Madanayake,
R. J. Cintra,
V. S. Dimitrov
Abstract:
An area efficient row-parallel architecture is proposed for the real-time implementation of bivariate algebraic integer (AI) encoded 2-D discrete cosine transform (DCT) for image and video processing. The proposed architecture computes 8$\times$8 2-D DCT transform based on the Arai DCT algorithm. An improved fast algorithm for AI based 1-D DCT computation is proposed along with a single channel 2-…
▽ More
An area efficient row-parallel architecture is proposed for the real-time implementation of bivariate algebraic integer (AI) encoded 2-D discrete cosine transform (DCT) for image and video processing. The proposed architecture computes 8$\times$8 2-D DCT transform based on the Arai DCT algorithm. An improved fast algorithm for AI based 1-D DCT computation is proposed along with a single channel 2-D DCT architecture. The design improves on the 4-channel AI DCT architecture that was published recently by reducing the number of integer channels to one and the number of 8-point 1-D DCT cores from 5 down to 2. The architecture offers exact computation of 8$\times$8 blocks of the 2-D DCT coefficients up to the FRS, which converts the coefficients from the AI representation to fixed-point format using the method of expansion factors. Prototype circuits corresponding to FRS blocks based on two expansion factors are realized, tested, and verified on FPGA-chip, using a Xilinx Virtex-6 XC6VLX240T device. Post place-and-route results show a 20% reduction in terms of area compared to the 2-D DCT architecture requiring five 1-D AI cores. The area-time and area-time${}^2$ complexity metrics are also reduced by 23% and 22% respectively for designs with 8-bit input word length. The digital realizations are simulated up to place and route for ASICs using 45 nm CMOS standard cells. The maximum estimated clock rate is 951 MHz for the CMOS realizations indicating 7.608$\cdot$10$^9$ pixels/seconds and a 8$\times$8 block rate of 118.875 MHz.
△ Less
Submitted 26 October, 2017;
originally announced October 2017.
-
A Digital Hardware Fast Algorithm and FPGA-based Prototype for a Novel 16-point Approximate DCT for Image Compression Applications
Authors:
F. M. Bayer,
R. J. Cintra,
A. Edirisuriya,
A. Madanayake
Abstract:
The discrete cosine transform (DCT) is the key step in many image and video coding standards. The 8-point DCT is an important special case, possessing several low-complexity approximations widely investigated. However, 16-point DCT transform has energy compaction advantages. In this sense, this paper presents a new 16-point DCT approximation with null multiplicative complexity. The proposed transf…
▽ More
The discrete cosine transform (DCT) is the key step in many image and video coding standards. The 8-point DCT is an important special case, possessing several low-complexity approximations widely investigated. However, 16-point DCT transform has energy compaction advantages. In this sense, this paper presents a new 16-point DCT approximation with null multiplicative complexity. The proposed transform matrix is orthogonal and contains only zeros and ones. The proposed transform outperforms the well-know Walsh-Hadamard transform and the current state-of-the-art 16-point approximation. A fast algorithm for the proposed transform is also introduced. This fast algorithm is experimentally validated using hardware implementations that are physically realized and verified on a 40 nm CMOS Xilinx Virtex-6 XC6VLX240T FPGA chip for a maximum clock rate of 342 MHz. Rapid prototypes on FPGA for 8-bit input word size shows significant improvement in compressed image quality by up to 1-2 dB at the cost of only eight adders compared to the state-of-art 16-point DCT approximation algorithm in the literature [S. Bouguezel, M. O. Ahmad, and M. N. S. Swamy. A novel transform for image compression. In {\em Proceedings of the 53rd IEEE International Midwest Symposium on Circuits and Systems (MWSCAS)}, 2010].
△ Less
Submitted 6 February, 2017;
originally announced February 2017.
-
Low-complexity Pruned 8-point DCT Approximations for Image Encoding
Authors:
V. A. Coutinho,
R. J. Cintra,
F. M. Bayer,
S. Kulasekera,
A. Madanayake
Abstract:
Two multiplierless pruned 8-point discrete cosine transform (DCT) approximation are presented. Both transforms present lower arithmetic complexity than state-of-the-art methods. The performance of such new methods was assessed in the image compression context. A JPEG-like simulation was performed, demonstrating the adequateness and competitiveness of the introduced methods. Digital VLSI implementa…
▽ More
Two multiplierless pruned 8-point discrete cosine transform (DCT) approximation are presented. Both transforms present lower arithmetic complexity than state-of-the-art methods. The performance of such new methods was assessed in the image compression context. A JPEG-like simulation was performed, demonstrating the adequateness and competitiveness of the introduced methods. Digital VLSI implementation in CMOS technology was also considered. Both presented methods were realized in Berkeley Emulation Engine (BEE3).
△ Less
Submitted 11 December, 2016;
originally announced December 2016.
-
Energy-efficient 8-point DCT Approximations: Theory and Hardware Architectures
Authors:
R. J. Cintra,
F. M. Bayer,
V. A. Coutinho,
S. Kulasekera,
A. Madanayake
Abstract:
Due to its remarkable energy compaction properties, the discrete cosine transform (DCT) is employed in a multitude of compression standards, such as JPEG and H.265/HEVC. Several low-complexity integer approximations for the DCT have been proposed for both 1-D and 2-D signal analysis. The increasing demand for low-complexity, energy efficient methods require algorithms with even lower computational…
▽ More
Due to its remarkable energy compaction properties, the discrete cosine transform (DCT) is employed in a multitude of compression standards, such as JPEG and H.265/HEVC. Several low-complexity integer approximations for the DCT have been proposed for both 1-D and 2-D signal analysis. The increasing demand for low-complexity, energy efficient methods require algorithms with even lower computational costs. In this paper, new 8-point DCT approximations with very low arithmetic complexity are presented. The new transforms are proposed based on pruning state-of-the-art DCT approximations. The proposed algorithms were assessed in terms of arithmetic complexity, energy retention capability, and image compression performance. In addition, a metric combining performance and computational complexity measures was proposed. Results showed good performance and extremely low computational complexity. Introduced algorithms were mapped into systolic-array digital architectures and physically realized as digital prototype circuits using FPGA technology and mapped to 45nm CMOS technology. All hardware-related metrics showed low resource consumption of the proposed pruned approximate transforms. The best proposed transform according to the introduced metric presents a reduction in power consumption of 21--25%.
△ Less
Submitted 2 December, 2016;
originally announced December 2016.
-
Low-complexity Image and Video Coding Based on an Approximate Discrete Tchebichef Transform
Authors:
P. A. M. Oliveira,
R. J. Cintra,
F. M. Bayer,
S. Kulasekera,
A. Madanayake,
V. A. Coutinho
Abstract:
The usage of linear transformations has great relevance for data decorrelation applications, like image and video compression. In that sense, the discrete Tchebichef transform (DTT) possesses useful coding and decorrelation properties. The DTT transform kernel does not depend on the input data and fast algorithms can be developed to real time applications. However, the DTT fast algorithm presented…
▽ More
The usage of linear transformations has great relevance for data decorrelation applications, like image and video compression. In that sense, the discrete Tchebichef transform (DTT) possesses useful coding and decorrelation properties. The DTT transform kernel does not depend on the input data and fast algorithms can be developed to real time applications. However, the DTT fast algorithm presented in literature possess high computational complexity. In this work, we introduce a new low-complexity approximation for the DTT. The fast algorithm of the proposed transform is multiplication-free and requires a reduced number of additions and bit-shifting operations. Image and video compression simulations in popular standards shows good performance of the proposed transform. Regarding hardware resource consumption for FPGA shows 43.1% reduction of configurable logic blocks and ASIC place and route realization shows 57.7% reduction in the area-time figure when compared with the 2-D version of the exact DTT.
△ Less
Submitted 10 October, 2024; v1 submitted 24 September, 2016;
originally announced September 2016.
-
Multiplierless 16-point DCT Approximation for Low-complexity Image and Video Coding
Authors:
T. L. T. Silveira,
R. S. Oliveira,
F. M. Bayer,
R. J. Cintra,
A. Madanayake
Abstract:
An orthogonal 16-point approximate discrete cosine transform (DCT) is introduced. The proposed transform requires neither multiplications nor bit-shifting operations. A fast algorithm based on matrix factorization is introduced, requiring only 44 additions---the lowest arithmetic cost in literature. To assess the introduced transform, computational complexity, similarity with the exact DCT, and co…
▽ More
An orthogonal 16-point approximate discrete cosine transform (DCT) is introduced. The proposed transform requires neither multiplications nor bit-shifting operations. A fast algorithm based on matrix factorization is introduced, requiring only 44 additions---the lowest arithmetic cost in literature. To assess the introduced transform, computational complexity, similarity with the exact DCT, and coding performance measures are computed. Classical and state-of-the-art 16-point low-complexity transforms were used in a comparative analysis. In the context of image compression, the proposed approximation was evaluated via PSNR and SSIM measurements, attaining the best cost-benefit ratio among the competitors. For video encoding, the proposed approximation was embedded into a HEVC reference software for direct comparison with the original HEVC standard. Physically realized and tested using FPGA hardware, the proposed transform showed 35% and 37% improvements of area-time and area-time-squared VLSI metrics when compared to the best competing transform in the literature.
△ Less
Submitted 23 June, 2016;
originally announced June 2016.
-
An Orthogonal 16-point Approximate DCT for Image and Video Compression
Authors:
T. L. T. da Silveira,
F. M. Bayer,
R. J. Cintra,
S. Kulasekera,
A. Madanayake,
A. J. Kozakevicius
Abstract:
A low-complexity orthogonal multiplierless approximation for the 16-point discrete cosine transform (DCT) was introduced. The proposed method was designed to possess a very low computational cost. A fast algorithm based on matrix factorization was proposed requiring only 60~additions. The proposed architecture outperforms classical and state-of-the-art algorithms when assessed as a tool for image…
▽ More
A low-complexity orthogonal multiplierless approximation for the 16-point discrete cosine transform (DCT) was introduced. The proposed method was designed to possess a very low computational cost. A fast algorithm based on matrix factorization was proposed requiring only 60~additions. The proposed architecture outperforms classical and state-of-the-art algorithms when assessed as a tool for image and video compression. Digital VLSI hardware implementations were also proposed being physically realized in FPGA technology and implemented in 45 nm up to synthesis and place-route levels. Additionally, the proposed method was embedded into a high efficiency video coding (HEVC) reference software for actual proof-of-concept. Obtained results show negligible video degradation when compared to Chen DCT algorithm in HEVC.
△ Less
Submitted 26 May, 2016;
originally announced June 2016.
-
Multi-beam 4 GHz Microwave Apertures Using Current-Mode DFT Approximation on 65 nm CMOS
Authors:
V. Ariyarathna,
S. Kulasekera,
A. Madanayake,
D. Suarez,
R. J. Cintra,
F. M. Bayer,
L. Belostotski
Abstract:
A current-mode CMOS design is proposed for realizing receive mode multi-beams in the analog domain using a novel DFT approximation. High-bandwidth CMOS RF transistors are employed in low-voltage current mirrors to achieve bandwidths exceeding 4 GHz with good beam fidelity. Current mirrors realize the coefficients of the considered DFT approximation, which take simple values in $\{0, \pm1, \pm2\}$…
▽ More
A current-mode CMOS design is proposed for realizing receive mode multi-beams in the analog domain using a novel DFT approximation. High-bandwidth CMOS RF transistors are employed in low-voltage current mirrors to achieve bandwidths exceeding 4 GHz with good beam fidelity. Current mirrors realize the coefficients of the considered DFT approximation, which take simple values in $\{0, \pm1, \pm2\}$ only. This allows high bandwidths realizations using simple circuitry without needing phase-shifters or delays. The proposed design is used as a method to efficiently achieve spatial discrete Fourier transform operation across a ULA to obtain multiple simultaneous RF beams. An example using 1.2 V current-mode approximate DFT on 65 nm CMOS, with BSIM4 models from the RF kit, show potential operation up to 4 GHz with eight independent aperture beams.
△ Less
Submitted 23 May, 2015;
originally announced May 2015.
-
A Row-parallel 8$\times$8 2-D DCT Architecture Using Algebraic Integer Based Exact Computation
Authors:
A. Madanayake,
R. J. Cintra,
D. Onen,
V. S. Dimitrov,
N. T. Rajapaksha,
L. T. Bruton,
A. Edirisuriya
Abstract:
An algebraic integer (AI) based time-multiplexed row-parallel architecture and two final-reconstruction step (FRS) algorithms are proposed for the implementation of bivariate AI-encoded 2-D discrete cosine transform (DCT). The architecture directly realizes an error-free 2-D DCT without using FRSs between row-column transforms, leading to an 8$\times$8 2-D DCT which is entirely free of quantizatio…
▽ More
An algebraic integer (AI) based time-multiplexed row-parallel architecture and two final-reconstruction step (FRS) algorithms are proposed for the implementation of bivariate AI-encoded 2-D discrete cosine transform (DCT). The architecture directly realizes an error-free 2-D DCT without using FRSs between row-column transforms, leading to an 8$\times$8 2-D DCT which is entirely free of quantization errors in AI basis. As a result, the user-selectable accuracy for each of the coefficients in the FRS facilitates each of the 64 coefficients to have its precision set independently of others, avoiding the leakage of quantization noise between channels as is the case for published DCT designs. The proposed FRS uses two approaches based on (i) optimized Dempster-Macleod multipliers and (ii) expansion factor scaling. This architecture enables low-noise high-dynamic range applications in digital video processing that requires full control of the finite-precision computation of the 2-D DCT. The proposed architectures and FRS techniques are experimentally verified and validated using hardware implementations that are physically realized and verified on FPGA chip. Six designs, for 4- and 8-bit input word sizes, using the two proposed FRS schemes, have been designed, simulated, physically implemented and measured. The maximum clock rate and block-rate achieved among 8-bit input designs are 307.787 MHz and 38.47 MHz, respectively, implying a pixel rate of 8$\times$307.787$\approx$2.462 GHz if eventually embedded in a real-time video-processing system. The equivalent frame rate is about 1187.35 Hz for the image size of 1920$\times$1080. All implementations are functional on a Xilinx Virtex-6 XC6VLX240T FPGA device.
△ Less
Submitted 14 February, 2015;
originally announced February 2015.
-
A Discrete Tchebichef Transform Approximation for Image and Video Coding
Authors:
P. A. M. Oliveira,
R. J. Cintra,
F. M. Bayer,
S. Kulasekera,
A. Madanayake
Abstract:
In this paper, we introduce a low-complexity approximation for the discrete Tchebichef transform (DTT). The proposed forward and inverse transforms are multiplication-free and require a reduced number of additions and bit-shifting operations. Numerical compression simulations demonstrate the efficiency of the proposed transform for image and video coding. Furthermore, Xilinx Virtex-6 FPGA based ha…
▽ More
In this paper, we introduce a low-complexity approximation for the discrete Tchebichef transform (DTT). The proposed forward and inverse transforms are multiplication-free and require a reduced number of additions and bit-shifting operations. Numerical compression simulations demonstrate the efficiency of the proposed transform for image and video coding. Furthermore, Xilinx Virtex-6 FPGA based hardware realization shows 44.9% reduction in dynamic power consumption and 64.7% lower area when compared to the literature.
△ Less
Submitted 28 January, 2015;
originally announced February 2015.
-
Improved 8-point Approximate DCT for Image and Video Compression Requiring Only 14 Additions
Authors:
U. S. Potluri,
A. Madanayake,
R. J. Cintra,
F. M. Bayer,
S. Kulasekera,
A. Edirisuriya
Abstract:
Video processing systems such as HEVC requiring low energy consumption needed for the multimedia market has lead to extensive development in fast algorithms for the efficient approximation of 2-D DCT transforms. The DCT is employed in a multitude of compression standards due to its remarkable energy compaction properties. Multiplier-free approximate DCT transforms have been proposed that offer sup…
▽ More
Video processing systems such as HEVC requiring low energy consumption needed for the multimedia market has lead to extensive development in fast algorithms for the efficient approximation of 2-D DCT transforms. The DCT is employed in a multitude of compression standards due to its remarkable energy compaction properties. Multiplier-free approximate DCT transforms have been proposed that offer superior compression performance at very low circuit complexity. Such approximations can be realized in digital VLSI hardware using additions and subtractions only, leading to significant reductions in chip area and power consumption compared to conventional DCTs and integer transforms. In this paper, we introduce a novel 8-point DCT approximation that requires only 14 addition operations and no multiplications. The proposed transform possesses low computational complexity and is compared to state-of-the-art DCT approximations in terms of both algorithm complexity and peak signal-to-noise ratio. The proposed DCT approximation is a candidate for reconfigurable video standards such as HEVC. The proposed transform and several other DCT approximations are mapped to systolic-array digital architectures and physically realized as digital prototype circuits using FPGA technology and mapped to 45 nm CMOS technology.
△ Less
Submitted 13 January, 2015;
originally announced January 2015.
-
Multi-Beam RF Aperture Using Multiplierless FFT Approximation
Authors:
D. Suarez,
R. J. Cintra,
F. M. Bayer,
A. Sengupta,
S. Kulasekera,
A. Madanayake
Abstract:
Multiple independent radio frequency (RF) beams find applications in communications, radio astronomy, radar, and microwave imaging. An $N$-point FFT applied spatially across an array of receiver antennas provides $N$-independent RF beams at $\frac{N}{2}\log_2N$ multiplier complexity. Here, a low-complexity multiplierless approximation for the 8-point FFT is presented for RF beamforming, using only…
▽ More
Multiple independent radio frequency (RF) beams find applications in communications, radio astronomy, radar, and microwave imaging. An $N$-point FFT applied spatially across an array of receiver antennas provides $N$-independent RF beams at $\frac{N}{2}\log_2N$ multiplier complexity. Here, a low-complexity multiplierless approximation for the 8-point FFT is presented for RF beamforming, using only 26 additions. The algorithm provides eight beams that closely resemble the antenna array patterns of the traditional FFT-based beamformer albeit without using multipliers. The proposed FFT-like algorithm is useful for low-power RF multi-beam receivers; being synthesized in 45 nm CMOS technology at 1.1 V supply, and verified on-chip using a Xilinx Virtex-6 Lx240T FPGA device. The CMOS simulation and FPGA implementation indicate bandwidths of 588 MHz and 369 MHz, respectively, for each of the independent receive-mode RF beams.
△ Less
Submitted 8 January, 2015;
originally announced January 2015.
-
Multiplierless Approximate 4-point DCT VLSI Architectures for Transform Block Coding
Authors:
F. M. Bayer,
R. J. Cintra,
A. Madanayake,
U. S. Potluri
Abstract:
Two multiplierless algorithms are proposed for 4x4 approximate-DCT for transform coding in digital video. Computational architectures for 1-D/2-D realisations are implemented using Xilinx FPGA devices. CMOS synthesis at the 45 nm node indicate real-time operation at 1 GHz yielding 4x4 block rates of 125 MHz at less than 120 mW of dynamic power consumption.
Two multiplierless algorithms are proposed for 4x4 approximate-DCT for transform coding in digital video. Computational architectures for 1-D/2-D realisations are implemented using Xilinx FPGA devices. CMOS synthesis at the 45 nm node indicate real-time operation at 1 GHz yielding 4x4 block rates of 125 MHz at less than 120 mW of dynamic power consumption.
△ Less
Submitted 2 May, 2014;
originally announced May 2014.
-
A Multiplierless Pruned DCT-like Transformation for Image and Video Compression that Requires 10 Additions Only
Authors:
V. A. Coutinho,
R. J. Cintra,
F. M. Bayer,
S. Kulasekera,
A. Madanayake
Abstract:
A multiplierless pruned approximate 8-point discrete cosine transform (DCT) requiring only 10 additions is introduced. The proposed algorithm was assessed in image and video compression, showing competitive performance with state-of-the-art methods. Digital implementation in 45 nm CMOS technology up to place-and-route level indicates clock speed of 288 MHz at a 1.1 V supply. The 8x8 block rate is…
▽ More
A multiplierless pruned approximate 8-point discrete cosine transform (DCT) requiring only 10 additions is introduced. The proposed algorithm was assessed in image and video compression, showing competitive performance with state-of-the-art methods. Digital implementation in 45 nm CMOS technology up to place-and-route level indicates clock speed of 288 MHz at a 1.1 V supply. The 8x8 block rate is 36 MHz.The DCT approximation was embedded into HEVC reference software; resulting video frames, at up to 327 Hz for 8-bit RGB HEVC, presented negligible image degradation.
△ Less
Submitted 11 December, 2016; v1 submitted 24 February, 2014;
originally announced February 2014.