-
Mirror Descent Using the Tempesta Generalized Multi-parametric Logarithms
Authors:
Andrzej Cichocki
Abstract:
In this paper, we develop a wide class Mirror Descent (MD) algorithms, which play a key role in machine learning. For this purpose we formulated the constrained optimization problem, in which we exploits the Bregman divergence with the Tempesta multi-parametric deformation logarithm as a link function. This link function called also mirror function defines the mapping between the primal and dual s…
▽ More
In this paper, we develop a wide class Mirror Descent (MD) algorithms, which play a key role in machine learning. For this purpose we formulated the constrained optimization problem, in which we exploits the Bregman divergence with the Tempesta multi-parametric deformation logarithm as a link function. This link function called also mirror function defines the mapping between the primal and dual spaces and is associated with a very-wide (in fact, theoretically infinite) class of generalized trace-form entropies. In order to derive novel MD updates, we estimate generalized exponential function, which closely approximates the inverse of the multi-parametric Tempesta generalized logarithm. The shape and properties of the Tempesta logarithm and its inverse-deformed exponential functions can be tuned by several hyperparameters. By learning these hyperparameters, we can adapt to distribution or geometry of training data, and we can adjust them to achieve desired properties of MD algorithms. The concept of applying multi-parametric logarithms allow us to generate a new wide and flexible family of MD and mirror-less MD updates.
△ Less
Submitted 8 June, 2025;
originally announced June 2025.
-
Towards Understanding Normalization in Neural ODEs
Authors:
Julia Gusak,
Larisa Markeeva,
Talgat Daulbaev,
Alexandr Katrutsa,
Andrzej Cichocki,
Ivan Oseledets
Abstract:
Normalization is an important and vastly investigated technique in deep learning. However, its role for Ordinary Differential Equation based networks (neural ODEs) is still poorly understood. This paper investigates how different normalization techniques affect the performance of neural ODEs. Particularly, we show that it is possible to achieve 93% accuracy in the CIFAR-10 classification task, and…
▽ More
Normalization is an important and vastly investigated technique in deep learning. However, its role for Ordinary Differential Equation based networks (neural ODEs) is still poorly understood. This paper investigates how different normalization techniques affect the performance of neural ODEs. Particularly, we show that it is possible to achieve 93% accuracy in the CIFAR-10 classification task, and to the best of our knowledge, this is the highest reported accuracy among neural ODEs tested on this problem.
△ Less
Submitted 27 April, 2020; v1 submitted 20 April, 2020;
originally announced April 2020.
-
Interpolation Technique to Speed Up Gradients Propagation in Neural ODEs
Authors:
Talgat Daulbaev,
Alexandr Katrutsa,
Larisa Markeeva,
Julia Gusak,
Andrzej Cichocki,
Ivan Oseledets
Abstract:
We propose a simple interpolation-based method for the efficient approximation of gradients in neural ODE models. We compare it with the reverse dynamic method (known in the literature as "adjoint method") to train neural ODEs on classification, density estimation, and inference approximation tasks. We also propose a theoretical justification of our approach using logarithmic norm formalism. As a…
▽ More
We propose a simple interpolation-based method for the efficient approximation of gradients in neural ODE models. We compare it with the reverse dynamic method (known in the literature as "adjoint method") to train neural ODEs on classification, density estimation, and inference approximation tasks. We also propose a theoretical justification of our approach using logarithmic norm formalism. As a result, our method allows faster model training than the reverse dynamic method that was confirmed and validated by extensive numerical experiments for several standard benchmarks.
△ Less
Submitted 30 October, 2020; v1 submitted 11 March, 2020;
originally announced March 2020.
-
Block Hankel Tensor ARIMA for Multiple Short Time Series Forecasting
Authors:
Qiquan Shi,
Jiaming Yin,
Jiajun Cai,
Andrzej Cichocki,
Tatsuya Yokota,
Lei Chen,
Mingxuan Yuan,
Jia Zeng
Abstract:
This work proposes a novel approach for multiple time series forecasting. At first, multi-way delay embedding transform (MDT) is employed to represent time series as low-rank block Hankel tensors (BHT). Then, the higher-order tensors are projected to compressed core tensors by applying Tucker decomposition. At the same time, the generalized tensor Autoregressive Integrated Moving Average (ARIMA) i…
▽ More
This work proposes a novel approach for multiple time series forecasting. At first, multi-way delay embedding transform (MDT) is employed to represent time series as low-rank block Hankel tensors (BHT). Then, the higher-order tensors are projected to compressed core tensors by applying Tucker decomposition. At the same time, the generalized tensor Autoregressive Integrated Moving Average (ARIMA) is explicitly used on consecutive core tensors to predict future samples. In this manner, the proposed approach tactically incorporates the unique advantages of MDT tensorization (to exploit mutual correlations) and tensor ARIMA coupled with low-rank Tucker decomposition into a unified framework. This framework exploits the low-rank structure of block Hankel tensors in the embedded space and captures the intrinsic correlations among multiple TS, which thus can improve the forecasting results, especially for multiple short time series. Experiments conducted on three public datasets and two industrial datasets verify that the proposed BHT-ARIMA effectively improves forecasting accuracy and reduces computational cost compared with the state-of-the-art methods.
△ Less
Submitted 25 February, 2020;
originally announced February 2020.
-
Reduced-Order Modeling of Deep Neural Networks
Authors:
Julia Gusak,
Talgat Daulbaev,
Evgeny Ponomarev,
Andrzej Cichocki,
Ivan Oseledets
Abstract:
We introduce a new method for speeding up the inference of deep neural networks. It is somewhat inspired by the reduced-order modeling techniques for dynamical systems.The cornerstone of the proposed method is the maximum volume algorithm. We demonstrate efficiency on neural networks pre-trained on different datasets. We show that in many practical cases it is possible to replace convolutional lay…
▽ More
We introduce a new method for speeding up the inference of deep neural networks. It is somewhat inspired by the reduced-order modeling techniques for dynamical systems.The cornerstone of the proposed method is the maximum volume algorithm. We demonstrate efficiency on neural networks pre-trained on different datasets. We show that in many practical cases it is possible to replace convolutional layers with much smaller fully-connected layers with a relatively small drop in accuracy.
△ Less
Submitted 25 November, 2020; v1 submitted 15 October, 2019;
originally announced October 2019.
-
Multi-Kernel Capsule Network for Schizophrenia Identification
Authors:
Tian Wang,
Anastasios Bezerianos,
Andrzej Cichocki,
Junhua Li
Abstract:
Objective: Schizophrenia seriously affects the quality of life. To date, both simple (linear discriminant analysis) and complex (deep neural network) machine learning methods have been utilized to identify schizophrenia based on functional connectivity features. The existing simple methods need two separate steps (i.e., feature extraction and classification) to achieve the identification, which di…
▽ More
Objective: Schizophrenia seriously affects the quality of life. To date, both simple (linear discriminant analysis) and complex (deep neural network) machine learning methods have been utilized to identify schizophrenia based on functional connectivity features. The existing simple methods need two separate steps (i.e., feature extraction and classification) to achieve the identification, which disables simultaneous tuning for the best feature extraction and classifier training. The complex methods integrate two steps and can be simultaneously tuned to achieve optimal performance, but these methods require a much larger amount of data for model training. Methods: To overcome the aforementioned drawbacks, we proposed a multi-kernel capsule network (MKCapsnet), which was developed by considering the brain anatomical structure. Kernels were set to match with partition sizes of brain anatomical structure in order to capture interregional connectivities at the varying scales. With the inspiration of widely-used dropout strategy in deep learning, we developed vector dropout in the capsule layer to prevent overfitting of the model. Results: The comparison results showed that the proposed method outperformed the state-of-the-art methods. Besides, we compared performances using different parameters and illustrated the routing process to reveal characteristics of the proposed method. Conclusion: MKCapsnet is promising for schizophrenia identification. Significance: Our study not only proposed a multi-kernel capsule network but also provided useful information in the parameter setting, which is informative for further studies using a capsule network for neurophysiological signal classification.
△ Less
Submitted 30 July, 2019;
originally announced July 2019.
-
MUSCO: Multi-Stage Compression of neural networks
Authors:
Julia Gusak,
Maksym Kholiavchenko,
Evgeny Ponomarev,
Larisa Markeeva,
Ivan Oseledets,
Andrzej Cichocki
Abstract:
The low-rank tensor approximation is very promising for the compression of deep neural networks. We propose a new simple and efficient iterative approach, which alternates low-rank factorization with a smart rank selection and fine-tuning. We demonstrate the efficiency of our method comparing to non-iterative ones. Our approach improves the compression rate while maintaining the accuracy for a var…
▽ More
The low-rank tensor approximation is very promising for the compression of deep neural networks. We propose a new simple and efficient iterative approach, which alternates low-rank factorization with a smart rank selection and fine-tuning. We demonstrate the efficiency of our method comparing to non-iterative ones. Our approach improves the compression rate while maintaining the accuracy for a variety of tasks.
△ Less
Submitted 15 November, 2019; v1 submitted 24 March, 2019;
originally announced March 2019.
-
Brain-Computer Interface with Corrupted EEG Data: A Tensor Completion Approach
Authors:
Jordi Sole-Casals,
Cesar F. Caiafa,
Qibin Zhao,
Adrzej Cichocki
Abstract:
One of the current issues in Brain-Computer Interface is how to deal with noisy Electroencephalography measurements organized as multidimensional datasets. On the other hand, recently, significant advances have been made in multidimensional signal completion algorithms that exploit tensor decomposition models to capture the intricate relationship among entries in a multidimensional signal. We prop…
▽ More
One of the current issues in Brain-Computer Interface is how to deal with noisy Electroencephalography measurements organized as multidimensional datasets. On the other hand, recently, significant advances have been made in multidimensional signal completion algorithms that exploit tensor decomposition models to capture the intricate relationship among entries in a multidimensional signal. We propose to use tensor completion applied to EEG data for improving the classification performance in a motor imagery BCI system with corrupted measurements. Noisy measurements are considered as unknowns that are inferred from a tensor decomposition model. We evaluate the performance of four recently proposed tensor completion algorithms plus a simple interpolation strategy, first with random missing entries and then with missing samples constrained to have a specific structure (random missing channels), which is a more realistic assumption in BCI Applications. We measured the ability of these algorithms to reconstruct the tensor from observed data. Then, we tested the classification accuracy of imagined movement in a BCI experiment with missing samples. We show that for random missing entries, all tensor completion algorithms can recover missing samples increasing the classification performance compared to a simple interpolation approach. For the random missing channels case, we show that tensor completion algorithms help to reconstruct missing channels, significantly improving the accuracy in the classification of motor imagery, however, not at the same level as clean data. Tensor completion algorithms are useful in real BCI applications. The proposed strategy could allow using motor imagery BCI systems even when EEG data is highly affected by missing channels and/or samples, avoiding the need of new acquisitions in the calibration stage.
△ Less
Submitted 26 July, 2018; v1 submitted 13 June, 2018;
originally announced June 2018.
-
Numerical CP Decomposition of Some Difficult Tensors
Authors:
Petr Tichavsky,
Anh Huy Phan,
Andrzej Cichocki
Abstract:
In this paper, a numerical method is proposed for canonical polyadic (CP) decomposition of small size tensors. The focus is primarily on decomposition of tensors that correspond to small matrix multiplications. Here, rank of the tensors is equal to the smallest number of scalar multiplications that are necessary to accomplish the matrix multiplication. The proposed method is based on a constrained…
▽ More
In this paper, a numerical method is proposed for canonical polyadic (CP) decomposition of small size tensors. The focus is primarily on decomposition of tensors that correspond to small matrix multiplications. Here, rank of the tensors is equal to the smallest number of scalar multiplications that are necessary to accomplish the matrix multiplication. The proposed method is based on a constrained Levenberg-Marquardt optimization. Numerical results indicate the rank and border ranks of tensors that correspond to multiplication of matrices of the size 2x3 and 3x2, 3x3 and 3x2, 3x3 and 3x3, and 3x4 and 4x3. The ranks are 11, 15, 23 and 29, respectively. In particular, a novel algorithm for multiplying the matrices of the sizes 3x3 and 3x2 with 15 multiplications is presented.
△ Less
Submitted 4 March, 2016;
originally announced March 2016.
-
Bayesian Sparse Tucker Models for Dimension Reduction and Tensor Completion
Authors:
Qibin Zhao,
Liqing Zhang,
Andrzej Cichocki
Abstract:
Tucker decomposition is the cornerstone of modern machine learning on tensorial data analysis, which have attracted considerable attention for multiway feature extraction, compressive sensing, and tensor completion. The most challenging problem is related to determination of model complexity (i.e., multilinear rank), especially when noise and missing data are present. In addition, existing methods…
▽ More
Tucker decomposition is the cornerstone of modern machine learning on tensorial data analysis, which have attracted considerable attention for multiway feature extraction, compressive sensing, and tensor completion. The most challenging problem is related to determination of model complexity (i.e., multilinear rank), especially when noise and missing data are present. In addition, existing methods cannot take into account uncertainty information of latent factors, resulting in low generalization performance. To address these issues, we present a class of probabilistic generative Tucker models for tensor decomposition and completion with structural sparsity over multilinear latent space. To exploit structural sparse modeling, we introduce two group sparsity inducing priors by hierarchial representation of Laplace and Student-t distributions, which facilitates fully posterior inference. For model learning, we derived variational Bayesian inferences over all model (hyper)parameters, and developed efficient and scalable algorithms based on multilinear operations. Our methods can automatically adapt model complexity and infer an optimal multilinear rank by the principle of maximum lower bound of model evidence. Experimental results and comparisons on synthetic, chemometrics and neuroimaging data demonstrate remarkable performance of our models for recovering ground-truth of multilinear rank and missing entries.
△ Less
Submitted 10 May, 2015;
originally announced May 2015.
-
Log-Determinant Divergences Revisited: Alpha--Beta and Gamma Log-Det Divergences
Authors:
Andrzej Cichocki,
Sergio Cruces,
Shun-Ichi Amari
Abstract:
In this paper, we review and extend a family of log-det divergences for symmetric positive definite (SPD) matrices and discuss their fundamental properties. We show how to generate from parameterized Alpha-Beta (AB) and Gamma Log-det divergences many well known divergences, for example, the Stein's loss, S-divergence, called also Jensen-Bregman LogDet (JBLD) divergence, the Logdet Zero (Bhattachar…
▽ More
In this paper, we review and extend a family of log-det divergences for symmetric positive definite (SPD) matrices and discuss their fundamental properties. We show how to generate from parameterized Alpha-Beta (AB) and Gamma Log-det divergences many well known divergences, for example, the Stein's loss, S-divergence, called also Jensen-Bregman LogDet (JBLD) divergence, the Logdet Zero (Bhattacharryya) divergence, Affine Invariant Riemannian Metric (AIRM) as well as some new divergences. Moreover, we establish links and correspondences among many log-det divergences and display them on alpha-beta plain for various set of parameters. Furthermore, this paper bridges these divergences and shows also their links to divergences of multivariate and multiway Gaussian distributions. Closed form formulas are derived for gamma divergences of two multivariate Gaussian densities including as special cases the Kullback-Leibler, Bhattacharryya, Rényi and Cauchy-Schwartz divergences. Symmetrized versions of the log-det divergences are also discussed and reviewed. A class of divergences is extended to multiway divergences for separable covariance (precision) matrices.
△ Less
Submitted 23 December, 2014; v1 submitted 18 December, 2014;
originally announced December 2014.
-
Efficient Nonnegative Tucker Decompositions: Algorithms and Uniqueness
Authors:
Guoxu Zhou,
Andrzej Cichocki,
Qibin Zhao,
Shengli Xie
Abstract:
Nonnegative Tucker decomposition (NTD) is a powerful tool for the extraction of nonnegative parts-based and physically meaningful latent components from high-dimensional tensor data while preserving the natural multilinear structure of data. However, as the data tensor often has multiple modes and is large-scale, existing NTD algorithms suffer from a very high computational complexity in terms of…
▽ More
Nonnegative Tucker decomposition (NTD) is a powerful tool for the extraction of nonnegative parts-based and physically meaningful latent components from high-dimensional tensor data while preserving the natural multilinear structure of data. However, as the data tensor often has multiple modes and is large-scale, existing NTD algorithms suffer from a very high computational complexity in terms of both storage and computation time, which has been one major obstacle for practical applications of NTD. To overcome these disadvantages, we show how low (multilinear) rank approximation (LRA) of tensors is able to significantly simplify the computation of the gradients of the cost function, upon which a family of efficient first-order NTD algorithms are developed. Besides dramatically reducing the storage complexity and running time, the new algorithms are quite flexible and robust to noise because any well-established LRA approaches can be applied. We also show how nonnegativity incorporating sparsity substantially improves the uniqueness property and partially alleviates the curse of dimensionality of the Tucker decompositions. Simulation results on synthetic and real-world data justify the validity and high efficiency of the proposed NTD algorithms.
△ Less
Submitted 16 September, 2015; v1 submitted 16 April, 2014;
originally announced April 2014.
-
Non-Orthogonal Tensor Diagonalization
Authors:
Petr Tichavsky,
Anh Huy Phan,
Andrzej Cichocki
Abstract:
Tensor diagonalization means transforming a given tensor to an exactly or nearly diagonal form through multiplying the tensor by non-orthogonal invertible matrices along selected dimensions of the tensor. It is generalization of approximate joint diagonalization (AJD) of a set of matrices. In particular, we derive (1) a new algorithm for symmetric AJD, which is called two-sided symmetric diagonali…
▽ More
Tensor diagonalization means transforming a given tensor to an exactly or nearly diagonal form through multiplying the tensor by non-orthogonal invertible matrices along selected dimensions of the tensor. It is generalization of approximate joint diagonalization (AJD) of a set of matrices. In particular, we derive (1) a new algorithm for symmetric AJD, which is called two-sided symmetric diagonalization of order-three tensor, (2) a similar algorithm for non-symmetric AJD, also called general two-sided diagonalization of an order-3 tensor, and (3) an algorithm for three-sided diagonalization of order-3 or order-4 tensors. The latter two algorithms may serve for canonical polyadic (CP) tensor decomposition, and they can outperform other CP tensor decomposition methods in terms of computational speed under the restriction that the tensor rank does not exceed the tensor multilinear rank. Finally, we propose (4) similar algorithms for tensor block diagonalization, which is related to the tensor block-term decomposition.
△ Less
Submitted 1 July, 2016; v1 submitted 7 February, 2014;
originally announced February 2014.
-
Bayesian CP Factorization of Incomplete Tensors with Automatic Rank Determination
Authors:
Qibin Zhao,
Liqing Zhang,
Andrzej Cichocki
Abstract:
CANDECOMP/PARAFAC (CP) tensor factorization of incomplete data is a powerful technique for tensor completion through explicitly capturing the multilinear latent factors. The existing CP algorithms require the tensor rank to be manually specified, however, the determination of tensor rank remains a challenging problem especially for CP rank. In addition, existing approaches do not take into account…
▽ More
CANDECOMP/PARAFAC (CP) tensor factorization of incomplete data is a powerful technique for tensor completion through explicitly capturing the multilinear latent factors. The existing CP algorithms require the tensor rank to be manually specified, however, the determination of tensor rank remains a challenging problem especially for CP rank. In addition, existing approaches do not take into account uncertainty information of latent factors, as well as missing entries. To address these issues, we formulate CP factorization using a hierarchical probabilistic model and employ a fully Bayesian treatment by incorporating a sparsity-inducing prior over multiple latent factors and the appropriate hyperpriors over all hyperparameters, resulting in automatic rank determination. To learn the model, we develop an efficient deterministic Bayesian inference algorithm, which scales linearly with data size. Our method is characterized as a tuning parameter-free approach, which can effectively infer underlying multilinear factors with a low-rank constraint, while also providing predictive distributions over missing entries. Extensive simulations on synthetic data illustrate the intrinsic capability of our method to recover the ground-truth of CP rank and prevent the overfitting problem, even when a large amount of entries are missing. Moreover, the results from real-world applications, including image inpainting and facial image synthesis, demonstrate that our method outperforms state-of-the-art approaches for both tensor factorization and tensor completion in terms of predictive performance.
△ Less
Submitted 9 October, 2014; v1 submitted 25 January, 2014;
originally announced January 2014.
-
Frequency Recognition in SSVEP-based BCI using Multiset Canonical Correlation Analysis
Authors:
Yu Zhang,
Guoxu Zhou,
Jing Jin,
Xingyu Wang,
Andrzej Cichocki
Abstract:
Canonical correlation analysis (CCA) has been one of the most popular methods for frequency recognition in steady-state visual evoked potential (SSVEP)-based brain-computer interfaces (BCIs). Despite its efficiency, a potential problem is that using pre-constructed sine-cosine waves as the required reference signals in the CCA method often does not result in the optimal recognition accuracy due to…
▽ More
Canonical correlation analysis (CCA) has been one of the most popular methods for frequency recognition in steady-state visual evoked potential (SSVEP)-based brain-computer interfaces (BCIs). Despite its efficiency, a potential problem is that using pre-constructed sine-cosine waves as the required reference signals in the CCA method often does not result in the optimal recognition accuracy due to their lack of features from the real EEG data. To address this problem, this study proposes a novel method based on multiset canonical correlation analysis (MsetCCA) to optimize the reference signals used in the CCA method for SSVEP frequency recognition. The MsetCCA method learns multiple linear transforms that implement joint spatial filtering to maximize the overall correlation among canonical variates, and hence extracts SSVEP common features from multiple sets of EEG data recorded at the same stimulus frequency. The optimized reference signals are formed by combination of the common features and completely based on training data. Experimental study with EEG data from ten healthy subjects demonstrates that the MsetCCA method improves the recognition accuracy of SSVEP frequency in comparison with the CCA method and other two competing methods (multiway CCA (MwayCCA) and phase constrained CCA (PCCA)), especially for a small number of channels and a short time window length. The superiority indicates that the proposed MsetCCA method is a new promising candidate for frequency recognition in SSVEP-based BCIs.
△ Less
Submitted 16 January, 2014; v1 submitted 26 August, 2013;
originally announced August 2013.
-
Tensor Decompositions: A New Concept in Brain Data Analysis?
Authors:
Andrzej Cichocki
Abstract:
Matrix factorizations and their extensions to tensor factorizations and decompositions have become prominent techniques for linear and multilinear blind source separation (BSS), especially multiway Independent Component Analysis (ICA), NonnegativeMatrix and Tensor Factorization (NMF/NTF), Smooth Component Analysis (SmoCA) and Sparse Component Analysis (SCA). Moreover, tensor decompositions have ma…
▽ More
Matrix factorizations and their extensions to tensor factorizations and decompositions have become prominent techniques for linear and multilinear blind source separation (BSS), especially multiway Independent Component Analysis (ICA), NonnegativeMatrix and Tensor Factorization (NMF/NTF), Smooth Component Analysis (SmoCA) and Sparse Component Analysis (SCA). Moreover, tensor decompositions have many other potential applications beyond multilinear BSS, especially feature extraction, classification, dimensionality reduction and multiway clustering. In this paper, we briefly overview new and emerging models and approaches for tensor decompositions in applications to group and linked multiway BSS/ICA, feature extraction, classification andMultiway Partial Least Squares (MPLS) regression problems. Keywords: Multilinear BSS, linked multiway BSS/ICA, tensor factorizations and decompositions, constrained Tucker and CP models, Penalized Tensor Decompositions (PTD), feature extraction, classification, multiway PLS and CCA.
△ Less
Submitted 2 May, 2013;
originally announced May 2013.