Search | arXiv e-print repository

Fast algorithms for complex-valued discrete Fourier transform with separate real and imaginary inputs/outputs

Abstract: Fast Fourier transform algorithms are an arsenal of effective tools for solving various problems of analysis and high-speed processing of signals of various natures. Almost all of these algorithms are designed to process sequences of complex-valued data when each element of the sequence represents a single whole. However, in some cases, it is more advantageous to represent each element of the inpu… ▽ More Fast Fourier transform algorithms are an arsenal of effective tools for solving various problems of analysis and high-speed processing of signals of various natures. Almost all of these algorithms are designed to process sequences of complex-valued data when each element of the sequence represents a single whole. However, in some cases, it is more advantageous to represent each element of the input and output sequences by a pair of real numbers. Such a need arises, for example, when further post-processing of spectral coefficients is carried out through two independent channels. Taking into account the noted need, the article proposes an algorithm for fast complex-valued discrete Fourier transform with separate real and imaginary inputs/outputs. A vector-matrix computational procedure is given that allows one to adequately describe and formalize the sequence of calculations when implementing the proposed algorithm. △ Less

Submitted 9 April, 2025; originally announced April 2025.

Comments: 4 pages, 2 figures

MSC Class: 65T50

arXiv:2009.00425 [pdf]

An algorithm for dividing quaternions

Authors: Aleksandr Cariow, Galina Cariowa

Abstract: In this work, a rationalized algorithm for calculating the quotient of two quaternions is presented which reduces the number of underlying real multiplications. Hardware for fast multiplication is much more expensive than hardware for fast addition. Therefore, reducing the number of multiplications in VLSI processor design is usually a desirable task. The performing of a quaternion division using… ▽ More In this work, a rationalized algorithm for calculating the quotient of two quaternions is presented which reduces the number of underlying real multiplications. Hardware for fast multiplication is much more expensive than hardware for fast addition. Therefore, reducing the number of multiplications in VLSI processor design is usually a desirable task. The performing of a quaternion division using the naive method takes 16 multiplications, 15 additions, 4 squarings and 4 divisions of real numbers while the proposed algorithm can compute the same result in only 8 multiplications (or multipliers in hardware implementation case), 31 additions, 4 squaring and 4 division of real numbers. △ Less

Submitted 30 August, 2020; originally announced September 2020.

Comments: 9 pages, 2 figures. arXiv admin note: text overlap with arXiv:1608.07596

MSC Class: 11R52; 65Y10; 65Y20; 68W10; 68W35 ACM Class: F.2.1; I.1.2; C.1.4; C.3

arXiv:2004.05607 [pdf]

Minimal Filtering Algorithms for Convolutional Neural Networks

Authors: Aleksandr Cariow, Galina Cariowa

Abstract: In this paper, we present several resource-efficient algorithmic solutions regarding the fully parallel hardware implementation of the basic filtering operation performed in the convolutional layers of convolution neural networks. In fact, these basic operations calculate two inner products of neighboring vectors formed by a sliding time window from the current data stream with an impulse response… ▽ More In this paper, we present several resource-efficient algorithmic solutions regarding the fully parallel hardware implementation of the basic filtering operation performed in the convolutional layers of convolution neural networks. In fact, these basic operations calculate two inner products of neighboring vectors formed by a sliding time window from the current data stream with an impulse response of the M-tap finite impulse response filter. We used Winograd minimal filtering trick and applied it to develop fully parallel hardware-oriented algorithms for implementing the basic filtering operation for M=3,5,7,9, and 11. A fully parallel hardware implementation of the proposed algorithms in each case gives approximately 30 percent savings in the number of embedded multipliers compared to a fully parallel hardware implementation of the naive calculation methods. △ Less

Submitted 12 April, 2020; originally announced April 2020.

Comments: 11 pages, 6 figures, 1 table

MSC Class: 62M45; 65Y10; 65Y20; 68W10; 62H35; 68U10; 68T10; 15A23 ACM Class: F.2.1; G.1.0; C.1.4; C.3; I.1.2; I.5.4

arXiv:1811.03458 [pdf]

Hardware-Efficient Structure of the Accelerating Module for Implementation of Convolutional Neural Network Basic Operation

Authors: Aleksandr Cariow, Galina Cariowa

Abstract: This paper presents a structural design of the hardware-efficient module for implementation of convolution neural network (CNN) basic operation with reduced implementation complexity. For this purpose we utilize some modification of the Winograd minimal filtering method as well as computation vectorization principles. This module calculate inner products of two consecutive segments of the original… ▽ More This paper presents a structural design of the hardware-efficient module for implementation of convolution neural network (CNN) basic operation with reduced implementation complexity. For this purpose we utilize some modification of the Winograd minimal filtering method as well as computation vectorization principles. This module calculate inner products of two consecutive segments of the original data sequence, formed by a sliding window of length 3, with the elements of a filter impulse response. The fully parallel structure of the module for calculating these two inner products, based on the implementation of a naive method of calculation, requires 6 binary multipliers and 4 binary adders. The use of the Winograd minimal filtering method allows to construct a module structure that requires only 4 binary multipliers and 8 binary adders. Since a high-performance convolutional neural network can contain tens or even hundreds of such modules, such a reduction can have a significant effect. △ Less

Submitted 7 November, 2018; originally announced November 2018.

Comments: 3 pages, 5 figures

arXiv:1705.07465 [pdf]

Some Schemes for Implementation of Arithmetic Operations with Complex Numbers Using Squaring Units

Authors: Aleksandr Cariow, Galina Cariowa

Abstract: In this paper, new schemes for a squarer, multiplier and divider of complex numbers are proposed. Traditional structural solutions for each of these operations require the presence some number of general-purpose binary multipliers. The advantage of our solutions is a removing of multiplications through replacing them by less costly squarers. We use Logan's trick and quarter square technique, which… ▽ More In this paper, new schemes for a squarer, multiplier and divider of complex numbers are proposed. Traditional structural solutions for each of these operations require the presence some number of general-purpose binary multipliers. The advantage of our solutions is a removing of multiplications through replacing them by less costly squarers. We use Logan's trick and quarter square technique, which propose to replace the calculation of the product of two real numbers by summing the squares. Replacing usual multipliers on digital squares implies reducing power consumption as well as decreases hardware circuit complexity. The squarer requiring less area and power as compared to general-purpose multiplier, it is interesting to assess the use of squarers to implementation of complex arithmetic. △ Less

Submitted 21 May, 2017; originally announced May 2017.

Comments: 3 pages. 3 figures, 2 tables

MSC Class: 15A23; 65Y20; 65F30 ACM Class: F.2.1, G.1.0, I.1.2

arXiv:1703.06320 [pdf]

Hardware-Efficient Schemes of Quaternion Multiplying Units for 2D Discrete Quaternion Fourier Transform Processors

Authors: Aleksandr Cariow, Galina Cariowa, Marina Chicheva

Abstract: In this paper, we offer and discuss three efficient structural solutions for the hardware-oriented implementation of discrete quaternion Fourier transform basic operations with reduced implementation complexities. The first solution: a scheme for calculating sq product, the second solution: a scheme for calculating qt product, and the third solution: a scheme for calculating sqt product, where s i… ▽ More In this paper, we offer and discuss three efficient structural solutions for the hardware-oriented implementation of discrete quaternion Fourier transform basic operations with reduced implementation complexities. The first solution: a scheme for calculating sq product, the second solution: a scheme for calculating qt product, and the third solution: a scheme for calculating sqt product, where s is a so-called i-quaternion, t is an j-quaternion, and q is an usual quaternion. The direct multiplication of two usual quaternions requires 16 real multiplications (or two-operand multipliers in the case of fully parallel hardware implementation) and 12 real additions (or binary adders). At the same time, our solutions allow to design the computation units, which consume only 6 multipliers plus 6 two input adders for implementation of sq or qt basic operations and 9 binary multipliers plus 6 two-input adders and 4 four-input adders for implementation of sqt basic operation. △ Less

Submitted 18 March, 2017; originally announced March 2017.

Comments: 3 pages, 3 figures

MSC Class: 65T50; 15A04; 15A66; 15A66; 15A69; 03D15; 65Y20; 65Y10 ACM Class: F.2.1; I.1.2; C.1.4; C.3

arXiv:1609.01585 [pdf]

A Hardware-Efficient Approach to Computing the Rotation Matrix from a Quaternion

Authors: Aleksandr Cariow, Galina Cariowa

Abstract: In this paper, we have proposed a novel VLSI-oriented approach to computing the rotation matrix entries from the quaternion coefficients. The advantage of this approach is the complete elimination of multiplications and replacing them by less costly squarings. Our approach uses Logan's identity, which proposes to replace the calculation of the product of two numbers on summing the squares via the… ▽ More In this paper, we have proposed a novel VLSI-oriented approach to computing the rotation matrix entries from the quaternion coefficients. The advantage of this approach is the complete elimination of multiplications and replacing them by less costly squarings. Our approach uses Logan's identity, which proposes to replace the calculation of the product of two numbers on summing the squares via the Binomial Theorem. Replacing multiplications by squarings implies reducing power consumption as well as decreases hardware circuit complexity. △ Less

Submitted 6 September, 2016; originally announced September 2016.

Comments: 5 pages, 5 figures

MSC Class: 65Y04; 65Y05; 65Y10; 65Y20; 68M07 ACM Class: B.7.1; C.1.2; C.1.4; C.3; F.2.1; G.1.0; I.3.1; I.3.7

arXiv:1608.07596 [pdf]

An algorithm for dividing two complex numbers

Authors: Aleksandr Cariow

Abstract: In this work a rationalized algorithm for calculating the quotient of two complex numbers is presented which reduces the number of underlying real multiplications. The performing of a complex number division using the naive method takes 4 multiplications, 3 additions, 2 squarings and 2 divisions of real numbers while the proposed algorithm can compute the same result in only 3 multiplications ( or… ▽ More In this work a rationalized algorithm for calculating the quotient of two complex numbers is presented which reduces the number of underlying real multiplications. The performing of a complex number division using the naive method takes 4 multiplications, 3 additions, 2 squarings and 2 divisions of real numbers while the proposed algorithm can compute the same result in only 3 multiplications ( or multipliers in hardware implementation case), 6 additions, 2 squarings and 2 divisions of real numbers. △ Less

Submitted 30 August, 2016; v1 submitted 26 August, 2016; originally announced August 2016.

Comments: 4 pages, 1 figure

MSC Class: 15A23; 65Y20; 65F30 ACM Class: F.2.1; G.1.0; I.1.2

arXiv:1507.05387 [pdf, other]

An algorithm for discrete fractional Hadamard transform

Authors: Aleksandr Cariow, Dorota Majorkowska-Mech

Abstract: We present a novel algorithm for calculating the discrete fractional Hadamard transform for data vectors whose size N is a power of two. A direct method for calculation of the discrete fractional Hadamard transform requires $N^2$ multiplications, while in proposed algorithm the number of real multiplications is reduced to $N$log$_2N$. We present a novel algorithm for calculating the discrete fractional Hadamard transform for data vectors whose size N is a power of two. A direct method for calculation of the discrete fractional Hadamard transform requires $N^2$ multiplications, while in proposed algorithm the number of real multiplications is reduced to $N$log$_2N$. △ Less

Submitted 20 July, 2015; originally announced July 2015.

Comments: 22 pages, 4 figures

MSC Class: 15A04; 15A23; 65Y20 ACM Class: F.2.1; I.1.2

arXiv:1507.02525 [pdf]

An algorithm for fast computation of the multiresolution discrete Fourier transform

Authors: Bartosz Andreatto, Aleksandr Cariow

Abstract: The article presents a computationally effective algorithm for calculating the multiresolution discrete Fourier transform (MrDFT). The algorithm is based on the idea of reducing the computational complexity which was introduced by Wen and Sandler [10] and utilizes the vectorization of calculating process at each stage of the considered transformation. This allows for the use of a computational pro… ▽ More The article presents a computationally effective algorithm for calculating the multiresolution discrete Fourier transform (MrDFT). The algorithm is based on the idea of reducing the computational complexity which was introduced by Wen and Sandler [10] and utilizes the vectorization of calculating process at each stage of the considered transformation. This allows for the use of a computational process parallelization and results in a reduction of computation time. In the description of the computational procedure, which describes the algorithm, we use the matrix notation. This notation enables to represent adequately the space-time structures of the implemented computational process and directly map these structures into the constructions of a high-level programming language or into a hardware realization space. △ Less

Submitted 9 July, 2015; originally announced July 2015.

Comments: 8 pages, 2 figures

MSC Class: 15A23; 15A04; 65Y20 ACM Class: F.2.1; G.1.0; I.1.2

arXiv:1505.06425 [pdf, other]

An algorithm for multipication of Kaluza numbers

Authors: Aleksandr Cariow, Galina Cariowa, Rafał Łentek

Abstract: This paper presents the derivation of a new algorithm for multiplying of two Kaluza numbers. Performing this operation directly requires 1024 real multiplications and 992 real additions. The proposed algorithm can compute the same result with only 512 real multiplications and 576 real additions. The derivation of our algorithm is based on utilizing the fact that multiplication of two Kaluza number… ▽ More This paper presents the derivation of a new algorithm for multiplying of two Kaluza numbers. Performing this operation directly requires 1024 real multiplications and 992 real additions. The proposed algorithm can compute the same result with only 512 real multiplications and 576 real additions. The derivation of our algorithm is based on utilizing the fact that multiplication of two Kaluza numbers can be expressed as a matrixvector product. The matrix multiplicand that participates in the product calculating has unique structural properties. Namely exploitation of these specific properties leads to significant reducing of the complexity of Kaluza numbers multiplication. △ Less

Submitted 24 May, 2015; originally announced May 2015.

Comments: 22 pages,3 figures

MSC Class: 15A23; 15A66; 15A69; 65F30; 65Y20 ACM Class: F.2.1; I.1.2

arXiv:1503.01058 [pdf]

An algorithm for multiplication of split-octonions

Authors: Aleksandr Cariow, Galina Cariowa, Bartosz Kubsik

Abstract: In this paper we introduce efficient algorithm for the multiplication of split-octonions. The direct multiplication of two split-octonions requires 64 real multiplications and 56 real additions. More effective solutions still do not exist. We show how to compute a product of the split-octonions with 28 real multiplications and 92 real additions. During synthesis of the discussed algorithm we use t… ▽ More In this paper we introduce efficient algorithm for the multiplication of split-octonions. The direct multiplication of two split-octonions requires 64 real multiplications and 56 real additions. More effective solutions still do not exist. We show how to compute a product of the split-octonions with 28 real multiplications and 92 real additions. During synthesis of the discussed algorithm we use the fact that product of two split-octonions may be represented as vector-matrix product. The matrix that participates in the product calculating has unique structural properties that allow performing its advantageous decomposition. Namely this decomposition leads to significant reducing of the multiplicative complexity of split-octonions multiplication. △ Less

Submitted 3 March, 2015; originally announced March 2015.

Comments: 14 pages, 4 figures. arXiv admin note: substantial text overlap with arXiv:1502.06250

MSC Class: 15A23; 15A66; 65F30; 65Y20 ACM Class: F.2.1; I.1.2

arXiv:1502.06250 [pdf]

Derivation of a low multiplicative complexity algorithm for multiplying hyperbolic octonions

Authors: Aleksandr Cariow, Galina Cariowa, Jaroslaw Knapinski

Abstract: We present an efficient algorithm to multiply two hyperbolic octonions. The direct multiplication of two hyperbolic octonions requires 64 real multiplications and 56 real additions. More effective solutions still do not exist. We show how to compute a product of the hyperbolic octonions with 26 real multiplications and 92 real additions. During synthesis of the discussed algorithm we use the fact… ▽ More We present an efficient algorithm to multiply two hyperbolic octonions. The direct multiplication of two hyperbolic octonions requires 64 real multiplications and 56 real additions. More effective solutions still do not exist. We show how to compute a product of the hyperbolic octonions with 26 real multiplications and 92 real additions. During synthesis of the discussed algorithm we use the fact that product of two hyperbolic octonions may be represented as a matrix - vector product. The matrix multiplicand that participates in the product calculating has unique structural properties that allow performing its advantageous factorization. Namely this factorization leads to significant reducing of the computational complexity of hyperbolic octonions multiplication. △ Less

Submitted 22 February, 2015; originally announced February 2015.

Comments: 15 pages, 4 figures

MSC Class: 15A66; 65Y20; 65F30; 15A23

arXiv:1501.00828 [pdf]

A new algorithm for multiplying two Dirac numbers

Authors: Aleksandr Cariow, Galina Cariowa

Abstract: In this work a rationalized algorithm for Dirac numbers multiplication is presented. This algorithm has a low computational complexity feature and is well suited to FPGA implementation. The computation of two Dirac numbers product using the naïve method takes 256 real multiplications and 240 real additions, while the proposed algorithm can compute the same result in only 88 real multiplications an… ▽ More In this work a rationalized algorithm for Dirac numbers multiplication is presented. This algorithm has a low computational complexity feature and is well suited to FPGA implementation. The computation of two Dirac numbers product using the naïve method takes 256 real multiplications and 240 real additions, while the proposed algorithm can compute the same result in only 88 real multiplications and 256 real additions. During synthesis of the discussed algorithm we use the fact that Dirac numbers product may be represented as vector-matrix product. The matrix participating in the product has unique structural properties that allow performing its advantageous decomposition. Namely this decomposition leads to significant reducing of the computational complexity. △ Less

Submitted 5 January, 2015; originally announced January 2015.

Comments: 14 pages, 1 figure

MSC Class: 15B33; 11R52; 65F30; 65Y20; 68W35

arXiv:1410.6937 [pdf]

A Hardware-oriented Algorithm for Complex-valued Constant Matrix-vector Multiplication

Authors: Aleksandr Cariow, Galina Cariowa

Abstract: In this paper we present a hardware-oriented algorithm for constant matrix-vector product calculating, when the all elements of vector and matrix are complex numbers. The proposed algorithm versus the naive method of analogous calculations drastically reduces the number of multipliers required for FPGA implementation of complex-valued constant matrix-vector multiplication.If the fully parallel har… ▽ More In this paper we present a hardware-oriented algorithm for constant matrix-vector product calculating, when the all elements of vector and matrix are complex numbers. The proposed algorithm versus the naive method of analogous calculations drastically reduces the number of multipliers required for FPGA implementation of complex-valued constant matrix-vector multiplication.If the fully parallel hardware implementation of naive (schoolbook) method for complex-valued matrix-vector multiplication requires 4MN multipliers, 2M N-inputs adders and 2MN two-input adders, the proposed algorithm requires only 3N(M+1)/2 multipliers and 3M(N+2)+1,5N+2 two-input adders and 3(M+1) N/2-input adders. △ Less

Submitted 25 October, 2014; originally announced October 2014.

Comments: 4 pages, 3 fgures

MSC Class: 65F30; 68W10; 68W35 ACM Class: B.2.4; C.1.4; F.2.1

Showing 1–15 of 15 results for author: Cariow, A