-
Fast algorithms for complex-valued discrete Fourier transform with separate real and imaginary inputs/outputs
Authors:
Aleksandr Cariow
Abstract:
Fast Fourier transform algorithms are an arsenal of effective tools for solving various problems of analysis and high-speed processing of signals of various natures. Almost all of these algorithms are designed to process sequences of complex-valued data when each element of the sequence represents a single whole. However, in some cases, it is more advantageous to represent each element of the inpu…
▽ More
Fast Fourier transform algorithms are an arsenal of effective tools for solving various problems of analysis and high-speed processing of signals of various natures. Almost all of these algorithms are designed to process sequences of complex-valued data when each element of the sequence represents a single whole. However, in some cases, it is more advantageous to represent each element of the input and output sequences by a pair of real numbers. Such a need arises, for example, when further post-processing of spectral coefficients is carried out through two independent channels. Taking into account the noted need, the article proposes an algorithm for fast complex-valued discrete Fourier transform with separate real and imaginary inputs/outputs. A vector-matrix computational procedure is given that allows one to adequately describe and formalize the sequence of calculations when implementing the proposed algorithm.
△ Less
Submitted 9 April, 2025;
originally announced April 2025.
-
An algorithm for dividing quaternions
Authors:
Aleksandr Cariow,
Galina Cariowa
Abstract:
In this work, a rationalized algorithm for calculating the quotient of two quaternions is presented which reduces the number of underlying real multiplications. Hardware for fast multiplication is much more expensive than hardware for fast addition. Therefore, reducing the number of multiplications in VLSI processor design is usually a desirable task. The performing of a quaternion division using…
▽ More
In this work, a rationalized algorithm for calculating the quotient of two quaternions is presented which reduces the number of underlying real multiplications. Hardware for fast multiplication is much more expensive than hardware for fast addition. Therefore, reducing the number of multiplications in VLSI processor design is usually a desirable task. The performing of a quaternion division using the naive method takes 16 multiplications, 15 additions, 4 squarings and 4 divisions of real numbers while the proposed algorithm can compute the same result in only 8 multiplications (or multipliers in hardware implementation case), 31 additions, 4 squaring and 4 division of real numbers.
△ Less
Submitted 30 August, 2020;
originally announced September 2020.
-
Minimal Filtering Algorithms for Convolutional Neural Networks
Authors:
Aleksandr Cariow,
Galina Cariowa
Abstract:
In this paper, we present several resource-efficient algorithmic solutions regarding the fully parallel hardware implementation of the basic filtering operation performed in the convolutional layers of convolution neural networks. In fact, these basic operations calculate two inner products of neighboring vectors formed by a sliding time window from the current data stream with an impulse response…
▽ More
In this paper, we present several resource-efficient algorithmic solutions regarding the fully parallel hardware implementation of the basic filtering operation performed in the convolutional layers of convolution neural networks. In fact, these basic operations calculate two inner products of neighboring vectors formed by a sliding time window from the current data stream with an impulse response of the M-tap finite impulse response filter. We used Winograd minimal filtering trick and applied it to develop fully parallel hardware-oriented algorithms for implementing the basic filtering operation for M=3,5,7,9, and 11. A fully parallel hardware implementation of the proposed algorithms in each case gives approximately 30 percent savings in the number of embedded multipliers compared to a fully parallel hardware implementation of the naive calculation methods.
△ Less
Submitted 12 April, 2020;
originally announced April 2020.
-
Hardware-Efficient Structure of the Accelerating Module for Implementation of Convolutional Neural Network Basic Operation
Authors:
Aleksandr Cariow,
Galina Cariowa
Abstract:
This paper presents a structural design of the hardware-efficient module for implementation of convolution neural network (CNN) basic operation with reduced implementation complexity. For this purpose we utilize some modification of the Winograd minimal filtering method as well as computation vectorization principles. This module calculate inner products of two consecutive segments of the original…
▽ More
This paper presents a structural design of the hardware-efficient module for implementation of convolution neural network (CNN) basic operation with reduced implementation complexity. For this purpose we utilize some modification of the Winograd minimal filtering method as well as computation vectorization principles. This module calculate inner products of two consecutive segments of the original data sequence, formed by a sliding window of length 3, with the elements of a filter impulse response. The fully parallel structure of the module for calculating these two inner products, based on the implementation of a naive method of calculation, requires 6 binary multipliers and 4 binary adders. The use of the Winograd minimal filtering method allows to construct a module structure that requires only 4 binary multipliers and 8 binary adders. Since a high-performance convolutional neural network can contain tens or even hundreds of such modules, such a reduction can have a significant effect.
△ Less
Submitted 7 November, 2018;
originally announced November 2018.
-
Some Schemes for Implementation of Arithmetic Operations with Complex Numbers Using Squaring Units
Authors:
Aleksandr Cariow,
Galina Cariowa
Abstract:
In this paper, new schemes for a squarer, multiplier and divider of complex numbers are proposed. Traditional structural solutions for each of these operations require the presence some number of general-purpose binary multipliers. The advantage of our solutions is a removing of multiplications through replacing them by less costly squarers. We use Logan's trick and quarter square technique, which…
▽ More
In this paper, new schemes for a squarer, multiplier and divider of complex numbers are proposed. Traditional structural solutions for each of these operations require the presence some number of general-purpose binary multipliers. The advantage of our solutions is a removing of multiplications through replacing them by less costly squarers. We use Logan's trick and quarter square technique, which propose to replace the calculation of the product of two real numbers by summing the squares. Replacing usual multipliers on digital squares implies reducing power consumption as well as decreases hardware circuit complexity. The squarer requiring less area and power as compared to general-purpose multiplier, it is interesting to assess the use of squarers to implementation of complex arithmetic.
△ Less
Submitted 21 May, 2017;
originally announced May 2017.
-
Hardware-Efficient Schemes of Quaternion Multiplying Units for 2D Discrete Quaternion Fourier Transform Processors
Authors:
Aleksandr Cariow,
Galina Cariowa,
Marina Chicheva
Abstract:
In this paper, we offer and discuss three efficient structural solutions for the hardware-oriented implementation of discrete quaternion Fourier transform basic operations with reduced implementation complexities. The first solution: a scheme for calculating sq product, the second solution: a scheme for calculating qt product, and the third solution: a scheme for calculating sqt product, where s i…
▽ More
In this paper, we offer and discuss three efficient structural solutions for the hardware-oriented implementation of discrete quaternion Fourier transform basic operations with reduced implementation complexities. The first solution: a scheme for calculating sq product, the second solution: a scheme for calculating qt product, and the third solution: a scheme for calculating sqt product, where s is a so-called i-quaternion, t is an j-quaternion, and q is an usual quaternion. The direct multiplication of two usual quaternions requires 16 real multiplications (or two-operand multipliers in the case of fully parallel hardware implementation) and 12 real additions (or binary adders). At the same time, our solutions allow to design the computation units, which consume only 6 multipliers plus 6 two input adders for implementation of sq or qt basic operations and 9 binary multipliers plus 6 two-input adders and 4 four-input adders for implementation of sqt basic operation.
△ Less
Submitted 18 March, 2017;
originally announced March 2017.
-
A Hardware-Efficient Approach to Computing the Rotation Matrix from a Quaternion
Authors:
Aleksandr Cariow,
Galina Cariowa
Abstract:
In this paper, we have proposed a novel VLSI-oriented approach to computing the rotation matrix entries from the quaternion coefficients. The advantage of this approach is the complete elimination of multiplications and replacing them by less costly squarings. Our approach uses Logan's identity, which proposes to replace the calculation of the product of two numbers on summing the squares via the…
▽ More
In this paper, we have proposed a novel VLSI-oriented approach to computing the rotation matrix entries from the quaternion coefficients. The advantage of this approach is the complete elimination of multiplications and replacing them by less costly squarings. Our approach uses Logan's identity, which proposes to replace the calculation of the product of two numbers on summing the squares via the Binomial Theorem. Replacing multiplications by squarings implies reducing power consumption as well as decreases hardware circuit complexity.
△ Less
Submitted 6 September, 2016;
originally announced September 2016.
-
An algorithm for dividing two complex numbers
Authors:
Aleksandr Cariow
Abstract:
In this work a rationalized algorithm for calculating the quotient of two complex numbers is presented which reduces the number of underlying real multiplications. The performing of a complex number division using the naive method takes 4 multiplications, 3 additions, 2 squarings and 2 divisions of real numbers while the proposed algorithm can compute the same result in only 3 multiplications ( or…
▽ More
In this work a rationalized algorithm for calculating the quotient of two complex numbers is presented which reduces the number of underlying real multiplications. The performing of a complex number division using the naive method takes 4 multiplications, 3 additions, 2 squarings and 2 divisions of real numbers while the proposed algorithm can compute the same result in only 3 multiplications ( or multipliers in hardware implementation case), 6 additions, 2 squarings and 2 divisions of real numbers.
△ Less
Submitted 30 August, 2016; v1 submitted 26 August, 2016;
originally announced August 2016.
-
An algorithm for discrete fractional Hadamard transform
Authors:
Aleksandr Cariow,
Dorota Majorkowska-Mech
Abstract:
We present a novel algorithm for calculating the discrete fractional Hadamard transform for data vectors whose size N is a power of two. A direct method for calculation of the discrete fractional Hadamard transform requires $N^2$ multiplications, while in proposed algorithm the number of real multiplications is reduced to $N$log$_2N$.
We present a novel algorithm for calculating the discrete fractional Hadamard transform for data vectors whose size N is a power of two. A direct method for calculation of the discrete fractional Hadamard transform requires $N^2$ multiplications, while in proposed algorithm the number of real multiplications is reduced to $N$log$_2N$.
△ Less
Submitted 20 July, 2015;
originally announced July 2015.
-
An algorithm for fast computation of the multiresolution discrete Fourier transform
Authors:
Bartosz Andreatto,
Aleksandr Cariow
Abstract:
The article presents a computationally effective algorithm for calculating the multiresolution discrete Fourier transform (MrDFT). The algorithm is based on the idea of reducing the computational complexity which was introduced by Wen and Sandler [10] and utilizes the vectorization of calculating process at each stage of the considered transformation. This allows for the use of a computational pro…
▽ More
The article presents a computationally effective algorithm for calculating the multiresolution discrete Fourier transform (MrDFT). The algorithm is based on the idea of reducing the computational complexity which was introduced by Wen and Sandler [10] and utilizes the vectorization of calculating process at each stage of the considered transformation. This allows for the use of a computational process parallelization and results in a reduction of computation time. In the description of the computational procedure, which describes the algorithm, we use the matrix notation. This notation enables to represent adequately the space-time structures of the implemented computational process and directly map these structures into the constructions of a high-level programming language or into a hardware realization space.
△ Less
Submitted 9 July, 2015;
originally announced July 2015.
-
An algorithm for multipication of Kaluza numbers
Authors:
Aleksandr Cariow,
Galina Cariowa,
Rafał Łentek
Abstract:
This paper presents the derivation of a new algorithm for multiplying of two Kaluza numbers. Performing this operation directly requires 1024 real multiplications and 992 real additions. The proposed algorithm can compute the same result with only 512 real multiplications and 576 real additions. The derivation of our algorithm is based on utilizing the fact that multiplication of two Kaluza number…
▽ More
This paper presents the derivation of a new algorithm for multiplying of two Kaluza numbers. Performing this operation directly requires 1024 real multiplications and 992 real additions. The proposed algorithm can compute the same result with only 512 real multiplications and 576 real additions. The derivation of our algorithm is based on utilizing the fact that multiplication of two Kaluza numbers can be expressed as a matrixvector product. The matrix multiplicand that participates in the product calculating has unique structural properties. Namely exploitation of these specific properties leads to significant reducing of the complexity of Kaluza numbers multiplication.
△ Less
Submitted 24 May, 2015;
originally announced May 2015.
-
An algorithm for multiplication of split-octonions
Authors:
Aleksandr Cariow,
Galina Cariowa,
Bartosz Kubsik
Abstract:
In this paper we introduce efficient algorithm for the multiplication of split-octonions. The direct multiplication of two split-octonions requires 64 real multiplications and 56 real additions. More effective solutions still do not exist. We show how to compute a product of the split-octonions with 28 real multiplications and 92 real additions. During synthesis of the discussed algorithm we use t…
▽ More
In this paper we introduce efficient algorithm for the multiplication of split-octonions. The direct multiplication of two split-octonions requires 64 real multiplications and 56 real additions. More effective solutions still do not exist. We show how to compute a product of the split-octonions with 28 real multiplications and 92 real additions. During synthesis of the discussed algorithm we use the fact that product of two split-octonions may be represented as vector-matrix product. The matrix that participates in the product calculating has unique structural properties that allow performing its advantageous decomposition. Namely this decomposition leads to significant reducing of the multiplicative complexity of split-octonions multiplication.
△ Less
Submitted 3 March, 2015;
originally announced March 2015.
-
Derivation of a low multiplicative complexity algorithm for multiplying hyperbolic octonions
Authors:
Aleksandr Cariow,
Galina Cariowa,
Jaroslaw Knapinski
Abstract:
We present an efficient algorithm to multiply two hyperbolic octonions. The direct multiplication of two hyperbolic octonions requires 64 real multiplications and 56 real additions. More effective solutions still do not exist. We show how to compute a product of the hyperbolic octonions with 26 real multiplications and 92 real additions. During synthesis of the discussed algorithm we use the fact…
▽ More
We present an efficient algorithm to multiply two hyperbolic octonions. The direct multiplication of two hyperbolic octonions requires 64 real multiplications and 56 real additions. More effective solutions still do not exist. We show how to compute a product of the hyperbolic octonions with 26 real multiplications and 92 real additions. During synthesis of the discussed algorithm we use the fact that product of two hyperbolic octonions may be represented as a matrix - vector product. The matrix multiplicand that participates in the product calculating has unique structural properties that allow performing its advantageous factorization. Namely this factorization leads to significant reducing of the computational complexity of hyperbolic octonions multiplication.
△ Less
Submitted 22 February, 2015;
originally announced February 2015.
-
A new algorithm for multiplying two Dirac numbers
Authors:
Aleksandr Cariow,
Galina Cariowa
Abstract:
In this work a rationalized algorithm for Dirac numbers multiplication is presented. This algorithm has a low computational complexity feature and is well suited to FPGA implementation. The computation of two Dirac numbers product using the naïve method takes 256 real multiplications and 240 real additions, while the proposed algorithm can compute the same result in only 88 real multiplications an…
▽ More
In this work a rationalized algorithm for Dirac numbers multiplication is presented. This algorithm has a low computational complexity feature and is well suited to FPGA implementation. The computation of two Dirac numbers product using the naïve method takes 256 real multiplications and 240 real additions, while the proposed algorithm can compute the same result in only 88 real multiplications and 256 real additions. During synthesis of the discussed algorithm we use the fact that Dirac numbers product may be represented as vector-matrix product. The matrix participating in the product has unique structural properties that allow performing its advantageous decomposition. Namely this decomposition leads to significant reducing of the computational complexity.
△ Less
Submitted 5 January, 2015;
originally announced January 2015.
-
A Hardware-oriented Algorithm for Complex-valued Constant Matrix-vector Multiplication
Authors:
Aleksandr Cariow,
Galina Cariowa
Abstract:
In this paper we present a hardware-oriented algorithm for constant matrix-vector product calculating, when the all elements of vector and matrix are complex numbers. The proposed algorithm versus the naive method of analogous calculations drastically reduces the number of multipliers required for FPGA implementation of complex-valued constant matrix-vector multiplication.If the fully parallel har…
▽ More
In this paper we present a hardware-oriented algorithm for constant matrix-vector product calculating, when the all elements of vector and matrix are complex numbers. The proposed algorithm versus the naive method of analogous calculations drastically reduces the number of multipliers required for FPGA implementation of complex-valued constant matrix-vector multiplication.If the fully parallel hardware implementation of naive (schoolbook) method for complex-valued matrix-vector multiplication requires 4MN multipliers, 2M N-inputs adders and 2MN two-input adders, the proposed algorithm requires only 3N(M+1)/2 multipliers and 3M(N+2)+1,5N+2 two-input adders and 3(M+1) N/2-input adders.
△ Less
Submitted 25 October, 2014;
originally announced October 2014.