-
The Fascinating World of 2 $\times$ 2 $\times$ 2 Tensors: Its Geometry and Optimization Challenges
Authors:
Gabriel H. Brown,
Joe Kileel,
Tamara G. Kolda
Abstract:
This educational article highlights the geometric and algebraic complexities that distinguish tensors from matrices, to supplement coverage in advanced courses on linear algebra, matrix analysis, and tensor decompositions. Using the case of real-valued 2 $\times$ 2 $\times$ 2 tensors, we show how tensors violate many well-known properties of matrices: (1) The rank of a matrix is bounded by its sma…
▽ More
This educational article highlights the geometric and algebraic complexities that distinguish tensors from matrices, to supplement coverage in advanced courses on linear algebra, matrix analysis, and tensor decompositions. Using the case of real-valued 2 $\times$ 2 $\times$ 2 tensors, we show how tensors violate many well-known properties of matrices: (1) The rank of a matrix is bounded by its smallest dimension, but a 2 $\times$ 2 $\times$ 2 tensor can be rank 3. (2) Matrices have a single typical rank, but the rank of a generic 2 $\times$ 2 $\times$ 2 tensor can be 2 or 3 - it has two typical ranks. (3) Any limit point of a sequence of matrices of rank $r$ is at most rank $r$, but a limit point of a sequence of 2 $\times$ 2 $\times$ 2 tensors of rank 2 can be rank 3 (a higher rank). (4) Matrices always have a best rank-$r$ approximation, but no rank-3 tensor of size 2 $\times$ 2 $\times$ 2 has a best rank-2 approximation. We unify the analysis of the matrix and tensor cases using tools from algebraic geometry and optimization, providing derivations of these surprising facts. To build intuition for the geometry of rank-constrained sets, students and educators can explore the geometry of matrix and tensor ranks via our interactive visualization tool.
△ Less
Submitted 4 April, 2025;
originally announced April 2025.
-
Tensor Decomposition Meets RKHS: Efficient Algorithms for Smooth and Misaligned Data
Authors:
Brett W. Larsen,
Tamara G. Kolda,
Anru R. Zhang,
Alex H. Williams
Abstract:
The canonical polyadic (CP) tensor decomposition decomposes a multidimensional data array into a sum of outer products of finite-dimensional vectors. Instead, we can replace some or all of the vectors with continuous functions (infinite-dimensional vectors) from a reproducing kernel Hilbert space (RKHS). We refer to tensors with some infinite-dimensional modes as quasitensors, and the approach of…
▽ More
The canonical polyadic (CP) tensor decomposition decomposes a multidimensional data array into a sum of outer products of finite-dimensional vectors. Instead, we can replace some or all of the vectors with continuous functions (infinite-dimensional vectors) from a reproducing kernel Hilbert space (RKHS). We refer to tensors with some infinite-dimensional modes as quasitensors, and the approach of decomposing a tensor with some continuous RKHS modes is referred to as CP-HiFi (hybrid infinite and finite dimensional) tensor decomposition. An advantage of CP-HiFi is that it can enforce smoothness in the infinite dimensional modes. Further, CP-HiFi does not require the observed data to lie on a regular and finite rectangular grid and naturally incorporates misaligned data. We detail the methodology and illustrate it on a synthetic example.
△ Less
Submitted 10 August, 2024;
originally announced August 2024.
-
Convergence of Alternating Gradient Descent for Matrix Factorization
Authors:
Rachel Ward,
Tamara G. Kolda
Abstract:
We consider alternating gradient descent (AGD) with fixed step size applied to the asymmetric matrix factorization objective. We show that, for a rank-$r$ matrix $\mathbf{A} \in \mathbb{R}^{m \times n}$, $T = C (\frac{σ_1(\mathbf{A})}{σ_r(\mathbf{A})})^2 \log(1/ε)$ iterations of alternating gradient descent suffice to reach an $ε$-optimal factorization…
▽ More
We consider alternating gradient descent (AGD) with fixed step size applied to the asymmetric matrix factorization objective. We show that, for a rank-$r$ matrix $\mathbf{A} \in \mathbb{R}^{m \times n}$, $T = C (\frac{σ_1(\mathbf{A})}{σ_r(\mathbf{A})})^2 \log(1/ε)$ iterations of alternating gradient descent suffice to reach an $ε$-optimal factorization $\| \mathbf{A} - \mathbf{X} \mathbf{Y}^{T} \|^2 \leq ε\| \mathbf{A}\|^2$ with high probability starting from an atypical random initialization. The factors have rank $d \geq r$ so that $\mathbf{X}_{T}\in\mathbb{R}^{m \times d}$ and $\mathbf{Y}_{T} \in\mathbb{R}^{n \times d}$, and mild overparameterization suffices for the constant $C$ in the iteration complexity $T$ to be an absolute constant. Experiments suggest that our proposed initialization is not merely of theoretical benefit, but rather significantly improves the convergence rate of gradient descent in practice. Our proof is conceptually simple: a uniform Polyak-Łojasiewicz (PL) inequality and uniform Lipschitz smoothness constant are guaranteed for a sufficient number of iterations, starting from our random initialization. Our proof method should be useful for extending and simplifying convergence analyses for a broader class of nonconvex low-rank factorization problems.
△ Less
Submitted 7 February, 2024; v1 submitted 11 May, 2023;
originally announced May 2023.
-
Scalable symmetric Tucker tensor decomposition
Authors:
Ruhui Jin,
Joe Kileel,
Tamara G. Kolda,
Rachel Ward
Abstract:
We study the best low-rank Tucker decomposition of symmetric tensors. The motivating application is decomposing higher-order multivariate moments. Moment tensors have special structure and are important to various data science problems. We advocate for projected gradient descent (PGD) method and higher-order eigenvalue decomposition (HOEVD) approximation as computation schemes. Most importantly, w…
▽ More
We study the best low-rank Tucker decomposition of symmetric tensors. The motivating application is decomposing higher-order multivariate moments. Moment tensors have special structure and are important to various data science problems. We advocate for projected gradient descent (PGD) method and higher-order eigenvalue decomposition (HOEVD) approximation as computation schemes. Most importantly, we develop scalable adaptations of the basic PGD and HOEVD methods to decompose sample moment tensors. With the help of implicit and streaming techniques, we evade the overhead cost of building and storing the moment tensor. Such reductions make computing the Tucker decomposition realizable for large data instances in high dimensions. Numerical experiments demonstrate the efficiency of the algorithms and the applicability of moment tensor decompositions to real-world datasets. Finally we study the convergence on the Grassmannian manifold, and prove that the update sequence derived by the PGD solver achieves first- and second-order criticality.
△ Less
Submitted 10 June, 2023; v1 submitted 22 April, 2022;
originally announced April 2022.
-
Tensor Moments of Gaussian Mixture Models: Theory and Applications
Authors:
João M. Pereira,
Joe Kileel,
Tamara G. Kolda
Abstract:
Gaussian mixture models (GMMs) are fundamental tools in statistical and data sciences. We study the moments of multivariate Gaussians and GMMs. The $d$-th moment of an $n$-dimensional random variable is a symmetric $d$-way tensor of size $n^d$, so working with moments naively is assumed to be prohibitively expensive for $d>2$ and larger values of $n$. In this work, we develop theory and numerical…
▽ More
Gaussian mixture models (GMMs) are fundamental tools in statistical and data sciences. We study the moments of multivariate Gaussians and GMMs. The $d$-th moment of an $n$-dimensional random variable is a symmetric $d$-way tensor of size $n^d$, so working with moments naively is assumed to be prohibitively expensive for $d>2$ and larger values of $n$. In this work, we develop theory and numerical methods for \emph{implicit computations} with moment tensors of GMMs, reducing the computational and storage costs to $\mathcal{O}(n^2)$ and $\mathcal{O}(n^3)$, respectively, for general covariance matrices, and to $\mathcal{O}(n)$ and $\mathcal{O}(n)$, respectively, for diagonal ones. We derive concise analytic expressions for the moments in terms of symmetrized tensor products, relying on the correspondence between symmetric tensors and homogeneous polynomials, and combinatorial identities involving Bell polynomials. The primary application of this theory is to estimating GMM parameters (means and covariances) from a set of observations, when formulated as a moment-matching optimization problem. If there is a known and common covariance matrix, we also show it is possible to debias the data observations, in which case the problem of estimating the unknown means reduces to symmetric CP tensor decomposition. Numerical results validate and illustrate the numerical efficiency of our approaches. This work potentially opens the door to the competitiveness of the method of moments as compared to expectation maximization methods for parameter estimation of GMMs.
△ Less
Submitted 21 March, 2022; v1 submitted 14 February, 2022;
originally announced February 2022.
-
Sketching Matrix Least Squares via Leverage Scores Estimates
Authors:
Brett W. Larsen,
Tamara G. Kolda
Abstract:
We consider the matrix least squares problem of the form $\| \mathbf{A} \mathbf{X}-\mathbf{B} \|_F^2$ where the design matrix $\mathbf{A} \in \mathbb{R}^{N \times r}$ is tall and skinny with $N \gg r$. We propose to create a sketched version $\| \tilde{\mathbf{A}}\mathbf{X}-\tilde{\mathbf{B}} \|_F^2$ where the sketched matrices $\tilde{\mathbf{A}}$ and $\tilde{\mathbf{B}}$ contain weighted subsets…
▽ More
We consider the matrix least squares problem of the form $\| \mathbf{A} \mathbf{X}-\mathbf{B} \|_F^2$ where the design matrix $\mathbf{A} \in \mathbb{R}^{N \times r}$ is tall and skinny with $N \gg r$. We propose to create a sketched version $\| \tilde{\mathbf{A}}\mathbf{X}-\tilde{\mathbf{B}} \|_F^2$ where the sketched matrices $\tilde{\mathbf{A}}$ and $\tilde{\mathbf{B}}$ contain weighted subsets of the rows of $\mathbf{A}$ and $\mathbf{B}$, respectively. The subset of rows is determined via random sampling based on leverage score estimates for each row. We say that the sketched problem is $ε$-accurate if its solution $\tilde{\mathbf{X}}_{\rm \text{opt}} = \text{argmin } \| \tilde{\mathbf{A}}\mathbf{X}-\tilde{\mathbf{B}} \|_F^2$ satisfies $\|\mathbf{A}\tilde{\mathbf{X}}_{\rm \text{opt}}-\mathbf{B} \|_F^2 \leq (1+ε) \min \| \mathbf{A}\mathbf{X}-\mathbf{B} \|_F^2$ with high probability. We prove that the number of samples required for an $ε$-accurate solution is $O(r/(βε))$ where $β\in (0,1]$ is a measure of the quality of the leverage score estimates.
△ Less
Submitted 25 January, 2022;
originally announced January 2022.
-
Streaming Generalized Canonical Polyadic Tensor Decompositions
Authors:
Eric Phipps,
Nick Johnson,
Tamara G. Kolda
Abstract:
In this paper, we develop a method which we call OnlineGCP for computing the Generalized Canonical Polyadic (GCP) tensor decomposition of streaming data. GCP differs from traditional canonical polyadic (CP) tensor decompositions as it allows for arbitrary objective functions which the CP model attempts to minimize. This approach can provide better fits and more interpretable models when the observ…
▽ More
In this paper, we develop a method which we call OnlineGCP for computing the Generalized Canonical Polyadic (GCP) tensor decomposition of streaming data. GCP differs from traditional canonical polyadic (CP) tensor decompositions as it allows for arbitrary objective functions which the CP model attempts to minimize. This approach can provide better fits and more interpretable models when the observed tensor data is strongly non-Gaussian. In the streaming case, tensor data is gradually observed over time and the algorithm must incrementally update a GCP factorization with limited access to prior data. In this work, we extend the GCP formalism to the streaming context by deriving a GCP optimization problem to be solved as new tensor data is observed, formulate a tunable history term to balance reconstruction of recently observed data with data observed in the past, develop a scalable solution strategy based on segregated solves using stochastic gradient descent methods, describe a software implementation that provides performance and portability to contemporary CPU and GPU architectures and integrates with Matlab for enhanced useability, and demonstrate the utility and performance of the approach and software on several synthetic and real tensor data sets.
△ Less
Submitted 27 October, 2021;
originally announced October 2021.
-
Practical Leverage-Based Sampling for Low-Rank Tensor Decomposition
Authors:
Brett W. Larsen,
Tamara G. Kolda
Abstract:
The low-rank canonical polyadic tensor decomposition is useful in data analysis and can be computed by solving a sequence of overdetermined least squares subproblems. Motivated by consideration of sparse tensors, we propose sketching each subproblem using leverage scores to select a subset of the rows, with probabilistic guarantees on the solution accuracy. We randomly sample rows proportional to…
▽ More
The low-rank canonical polyadic tensor decomposition is useful in data analysis and can be computed by solving a sequence of overdetermined least squares subproblems. Motivated by consideration of sparse tensors, we propose sketching each subproblem using leverage scores to select a subset of the rows, with probabilistic guarantees on the solution accuracy. We randomly sample rows proportional to leverage score upper bounds that can be efficiently computed using the special Khatri-Rao subproblem structure inherent in tensor decomposition. Crucially, for a $(d+1)$-way tensor, the number of rows in the sketched system is $O(r^d/ε)$ for a decomposition of rank $r$ and $ε$-accuracy in the least squares solve, independent of both the size and the number of nonzeros in the tensor. Along the way, we provide a practical solution to the generic matrix sketching problem of sampling overabundance for high-leverage-score rows, proposing to include such rows deterministically and combine repeated samples in the sketched system; we conjecture that this can lead to improved theoretical bounds. Numerical results on real-world large-scale tensors show the method is significantly faster than deterministic methods at nearly the same level of accuracy.
△ Less
Submitted 3 January, 2022; v1 submitted 29 June, 2020;
originally announced June 2020.
-
Estimating Higher-Order Moments Using Symmetric Tensor Decomposition
Authors:
Samantha Sherman,
Tamara G. Kolda
Abstract:
We consider the problem of decomposing higher-order moment tensors, i.e., the sum of symmetric outer products of data vectors. Such a decomposition can be used to estimate the means in a Gaussian mixture model and for other applications in machine learning. The $d$th-order empirical moment tensor of a set of $p$ observations of $n$ variables is a symmetric $d$-way tensor. Our goal is to find a low…
▽ More
We consider the problem of decomposing higher-order moment tensors, i.e., the sum of symmetric outer products of data vectors. Such a decomposition can be used to estimate the means in a Gaussian mixture model and for other applications in machine learning. The $d$th-order empirical moment tensor of a set of $p$ observations of $n$ variables is a symmetric $d$-way tensor. Our goal is to find a low-rank tensor approximation comprising $r \ll p$ symmetric outer products. The challenge is that forming the empirical moment tensors costs $O(pn^d)$ operations and $O(n^d)$ storage, which may be prohibitively expensive; additionally, the algorithm to compute the low-rank approximation costs $O(n^d)$ per iteration. Our contribution is avoiding formation of the moment tensor, computing the low-rank tensor approximation of the moment tensor implicitly using $O(pnr)$ operations per iteration and no extra memory. This advance opens the door to more applications of higher-order moments since they can now be efficiently computed. We present numerical evidence of the computational savings and show an example of estimating the means for higher-order moments.
△ Less
Submitted 21 April, 2020; v1 submitted 9 November, 2019;
originally announced November 2019.
-
Faster Johnson-Lindenstrauss Transforms via Kronecker Products
Authors:
Ruhui Jin,
Tamara G. Kolda,
Rachel Ward
Abstract:
The Kronecker product is an important matrix operation with a wide range of applications in supporting fast linear transforms, including signal processing, graph theory, quantum computing and deep learning. In this work, we introduce a generalization of the fast Johnson-Lindenstrauss projection for embedding vectors with Kronecker product structure, the Kronecker fast Johnson-Lindenstrauss transfo…
▽ More
The Kronecker product is an important matrix operation with a wide range of applications in supporting fast linear transforms, including signal processing, graph theory, quantum computing and deep learning. In this work, we introduce a generalization of the fast Johnson-Lindenstrauss projection for embedding vectors with Kronecker product structure, the Kronecker fast Johnson-Lindenstrauss transform (KFJLT). The KFJLT reduces the embedding cost to an exponential factor of the standard fast Johnson-Lindenstrauss transform (FJLT)'s cost when applied to vectors with Kronecker structure, by avoiding explicitly forming the full Kronecker products. We prove that this computational gain comes with only a small price in embedding power: given $N = \prod_{k=1}^d n_k$, consider a finite set of $p$ points in a tensor product of $d$ constituent Euclidean spaces $\bigotimes_{k=d}^{1}\mathbb{R}^{n_k} \subset \mathbb{R}^{N}$. With high probability, a random KFJLT matrix of dimension $N \times m$ embeds the set of points up to multiplicative distortion $(1\pm \varepsilon)$ provided by $m \gtrsim \varepsilon^{-2} \cdot \log^{2d - 1} (p) \cdot \log N$. We conclude by describing a direct application of the KFJLT to the efficient solution of large-scale Kronecker-structured least squares problems for fitting the CP tensor decomposition.
△ Less
Submitted 30 July, 2020; v1 submitted 10 September, 2019;
originally announced September 2019.
-
Stochastic Gradients for Large-Scale Tensor Decomposition
Authors:
Tamara G. Kolda,
David Hong
Abstract:
Tensor decomposition is a well-known tool for multiway data analysis. This work proposes using stochastic gradients for efficient generalized canonical polyadic (GCP) tensor decomposition of large-scale tensors. GCP tensor decomposition is a recently proposed version of tensor decomposition that allows for a variety of loss functions such as Bernoulli loss for binary data or Huber loss for robust…
▽ More
Tensor decomposition is a well-known tool for multiway data analysis. This work proposes using stochastic gradients for efficient generalized canonical polyadic (GCP) tensor decomposition of large-scale tensors. GCP tensor decomposition is a recently proposed version of tensor decomposition that allows for a variety of loss functions such as Bernoulli loss for binary data or Huber loss for robust estimation. The stochastic gradient is formed from randomly sampled elements of the tensor and is efficient because it can be computed using the sparse matricized-tensor-times-Khatri-Rao product (MTTKRP) tensor kernel. For dense tensors, we simply use uniform sampling. For sparse tensors, we propose two types of stratified sampling that give precedence to sampling nonzeros. Numerical results demonstrate the advantages of the proposed approach and its scalability to large-scale problems.
△ Less
Submitted 7 July, 2020; v1 submitted 4 June, 2019;
originally announced June 2019.
-
Generalized Canonical Polyadic Tensor Decomposition
Authors:
David Hong,
Tamara G. Kolda,
Jed A. Duersch
Abstract:
Tensor decomposition is a fundamental unsupervised machine learning method in data science, with applications including network analysis and sensor data processing. This work develops a generalized canonical polyadic (GCP) low-rank tensor decomposition that allows other loss functions besides squared error. For instance, we can use logistic loss or Kullback-Leibler divergence, enabling tensor deco…
▽ More
Tensor decomposition is a fundamental unsupervised machine learning method in data science, with applications including network analysis and sensor data processing. This work develops a generalized canonical polyadic (GCP) low-rank tensor decomposition that allows other loss functions besides squared error. For instance, we can use logistic loss or Kullback-Leibler divergence, enabling tensor decomposition for binary or count data. We present a variety statistically-motivated loss functions for various scenarios. We provide a generalized framework for computing gradients and handling missing data that enables the use of standard optimization methods for fitting the model. We demonstrate the flexibility of GCP on several real-world examples including interactions in a social network, neural activity in a mouse, and monthly rainfall measurements in India.
△ Less
Submitted 21 January, 2019; v1 submitted 22 August, 2018;
originally announced August 2018.
-
A Practical Randomized CP Tensor Decomposition
Authors:
Casey Battaglino,
Grey Ballard,
Tamara G. Kolda
Abstract:
The CANDECOMP/PARAFAC (CP) decomposition is a leading method for the analysis of multiway data. The standard alternating least squares algorithm for the CP decomposition (CP-ALS) involves a series of highly overdetermined linear least squares problems. We extend randomized least squares methods to tensors and show the workload of CP-ALS can be drastically reduced without a sacrifice in quality. We…
▽ More
The CANDECOMP/PARAFAC (CP) decomposition is a leading method for the analysis of multiway data. The standard alternating least squares algorithm for the CP decomposition (CP-ALS) involves a series of highly overdetermined linear least squares problems. We extend randomized least squares methods to tensors and show the workload of CP-ALS can be drastically reduced without a sacrifice in quality. We introduce techniques for efficiently preprocessing, sampling, and computing randomized least squares on a dense tensor of arbitrary order, as well as an efficient sampling-based technique for checking the stopping condition. We also show more generally that the Khatri-Rao product (used within the CP-ALS iteration) produces conditions favorable for direct sampling. In numerical results, we see improvements in speed, reductions in memory requirements, and robustness with respect to initialization.
△ Less
Submitted 22 October, 2017; v1 submitted 23 January, 2017;
originally announced January 2017.
-
Parallel Tensor Compression for Large-Scale Scientific Data
Authors:
Woody Austin,
Grey Ballard,
Tamara G. Kolda
Abstract:
As parallel computing trends towards the exascale, scientific data produced by high-fidelity simulations are growing increasingly massive. For instance, a simulation on a three-dimensional spatial grid with 512 points per dimension that tracks 64 variables per grid point for 128 time steps yields 8~TB of data, assuming double precision. By viewing the data as a dense five-way tensor, we can comput…
▽ More
As parallel computing trends towards the exascale, scientific data produced by high-fidelity simulations are growing increasingly massive. For instance, a simulation on a three-dimensional spatial grid with 512 points per dimension that tracks 64 variables per grid point for 128 time steps yields 8~TB of data, assuming double precision. By viewing the data as a dense five-way tensor, we can compute a Tucker decomposition to find inherent low-dimensional multilinear structure, achieving compression ratios of up to 5000 on real-world data sets with negligible loss in accuracy. So that we can operate on such massive data, we present the first-ever distributed-memory parallel implementation for the Tucker decomposition, whose key computations correspond to parallel linear algebra operations, albeit with nonstandard data layouts. Our approach specifies a data distribution for tensors that avoids any tensor data redistribution, either locally or in parallel. We provide accompanying analysis of the computation and communication costs of the algorithms. To demonstrate the compression and accuracy of the method, we apply our approach to real-world data sets from combustion science simulations. We also provide detailed performance results, including parallel performance in both weak and strong scaling experiments.
△ Less
Submitted 23 February, 2016; v1 submitted 22 October, 2015;
originally announced October 2015.
-
Symmetric Orthogonal Tensor Decomposition is Trivial
Authors:
Tamara G. Kolda
Abstract:
We consider the problem of decomposing a real-valued symmetric tensor as the sum of outer products of real-valued, pairwise orthogonal vectors. Such decompositions do not generally exist, but we show that some symmetric tensor decomposition problems can be converted to orthogonal problems following the whitening procedure proposed by Anandkumar et al. (2012). If an orthogonal decomposition of an…
▽ More
We consider the problem of decomposing a real-valued symmetric tensor as the sum of outer products of real-valued, pairwise orthogonal vectors. Such decompositions do not generally exist, but we show that some symmetric tensor decomposition problems can be converted to orthogonal problems following the whitening procedure proposed by Anandkumar et al. (2012). If an orthogonal decomposition of an $m$-way $n$-dimensional symmetric tensor exists, we propose a novel method to compute it that reduces to an $n \times n$ symmetric matrix eigenproblem. We provide numerical results demonstrating the effectiveness of the method.
△ Less
Submitted 4 March, 2015;
originally announced March 2015.
-
Numerical Optimization for Symmetric Tensor Decomposition
Authors:
Tamara G. Kolda
Abstract:
We consider the problem of decomposing a real-valued symmetric tensor as the sum of outer products of real-valued vectors. Algebraic methods exist for computing complex-valued decompositions of symmetric tensors, but here we focus on real-valued decompositions, both unconstrained and nonnegative, for problems with low-rank structure. We discuss when solutions exist and how to formulate the mathema…
▽ More
We consider the problem of decomposing a real-valued symmetric tensor as the sum of outer products of real-valued vectors. Algebraic methods exist for computing complex-valued decompositions of symmetric tensors, but here we focus on real-valued decompositions, both unconstrained and nonnegative, for problems with low-rank structure. We discuss when solutions exist and how to formulate the mathematical program. Numerical results show the properties of the proposed formulations (including one that ignores symmetry) on a set of test problems and illustrate that these straightforward formulations can be effective even though the problem is nonconvex.
△ Less
Submitted 19 February, 2015; v1 submitted 16 October, 2014;
originally announced October 2014.
-
An Adaptive Shifted Power Method for Computing Generalized Tensor Eigenpairs
Authors:
Tamara G. Kolda,
Jackson R. Mayo
Abstract:
Several tensor eigenpair definitions have been put forth in the past decade, but these can all be unified under generalized tensor eigenpair framework, introduced by Chang, Pearson, and Zhang (2009). Given mth-order, n-dimensional real-valued symmetric tensors A and B, the goal is to find $λ\in R$ and $x \in R^n$, $x \neq 0$, such that $Ax^{m-1} = λBx^{m-1}$. Different choices for B yield differen…
▽ More
Several tensor eigenpair definitions have been put forth in the past decade, but these can all be unified under generalized tensor eigenpair framework, introduced by Chang, Pearson, and Zhang (2009). Given mth-order, n-dimensional real-valued symmetric tensors A and B, the goal is to find $λ\in R$ and $x \in R^n$, $x \neq 0$, such that $Ax^{m-1} = λBx^{m-1}$. Different choices for B yield different versions of the tensor eigenvalue problem. We present our generalized eigenproblem adaptive power method (GEAP) method for solving the problem, which is an extension of the shifted symmetric higher-order power method (SS-HOPM) for finding Z-eigenpairs. A major drawback of SS-HOPM was that its performance depended in choosing an appropriate shift, but our GEAP method also includes an adaptive method for choosing the shift automatically.
△ Less
Submitted 9 June, 2014; v1 submitted 6 January, 2014;
originally announced January 2014.
-
Newton-Based Optimization for Kullback-Leibler Nonnegative Tensor Factorizations
Authors:
Samantha Hansen,
Todd Plantenga,
Tamara G. Kolda
Abstract:
Tensor factorizations with nonnegative constraints have found application in analyzing data from cyber traffic, social networks, and other areas. We consider application data best described as being generated by a Poisson process (e.g., count data), which leads to sparse tensors that can be modeled by sparse factor matrices. In this paper we investigate efficient techniques for computing an approp…
▽ More
Tensor factorizations with nonnegative constraints have found application in analyzing data from cyber traffic, social networks, and other areas. We consider application data best described as being generated by a Poisson process (e.g., count data), which leads to sparse tensors that can be modeled by sparse factor matrices. In this paper we investigate efficient techniques for computing an appropriate canonical polyadic tensor factorization based on the Kullback-Leibler divergence function. We propose novel subproblem solvers within the standard alternating block variable approach. Our new methods exploit structure and reformulate the optimization problem as small independent subproblems. We employ bound-constrained Newton and quasi-Newton methods. We compare our algorithms against other codes, demonstrating superior speed for high accuracy results and the ability to quickly find sparse solutions.
△ Less
Submitted 10 November, 2014; v1 submitted 17 April, 2013;
originally announced April 2013.
-
Exploiting Symmetry in Tensors for High Performance: Multiplication with Symmetric Tensors
Authors:
Martin D. Schatz,
Tze Meng Low,
Robert A. van de Geijn,
Tamara G. Kolda
Abstract:
Symmetric tensor operations arise in a wide variety of computations. However, the benefits of exploiting symmetry in order to reduce storage and computation is in conflict with a desire to simplify memory access patterns. In this paper, we propose a blocked data structure (Blocked Compact Symmetric Storage) wherein we consider the tensor by blocks and store only the unique blocks of a symmetric te…
▽ More
Symmetric tensor operations arise in a wide variety of computations. However, the benefits of exploiting symmetry in order to reduce storage and computation is in conflict with a desire to simplify memory access patterns. In this paper, we propose a blocked data structure (Blocked Compact Symmetric Storage) wherein we consider the tensor by blocks and store only the unique blocks of a symmetric tensor. We propose an algorithm-by-blocks, already shown of benefit for matrix computations, that exploits this storage format by utilizing a series of temporary tensors to avoid redundant computation. Further, partial symmetry within temporaries is exploited to further avoid redundant storage and redundant computation. A detailed analysis shows that, relative to storing and computing with tensors without taking advantage of symmetry and partial symmetry, storage requirements are reduced by a factor of $ O\left( m! \right)$ and computational requirements by a factor of $O\left( (m+1)!/2^m \right)$, where $ m $ is the order of the tensor. However, as the analysis shows, care must be taken in choosing the correct block size to ensure these storage and computational benefits are achieved (particularly for low-order tensors). An implementation demonstrates that storage is greatly reduced and the complexity introduced by storing and computing with tensors by blocks is manageable. Preliminary results demonstrate that computational time is also reduced. The paper concludes with a discussion of how insights in this paper point to opportunities for generalizing recent advances in the domain of linear algebra libraries to the field of multi-linear computation.
△ Less
Submitted 9 April, 2014; v1 submitted 31 January, 2013;
originally announced January 2013.
-
On Tensors, Sparsity, and Nonnegative Factorizations
Authors:
Eric C. Chi,
Tamara G. Kolda
Abstract:
Tensors have found application in a variety of fields, ranging from chemometrics to signal processing and beyond. In this paper, we consider the problem of multilinear modeling of sparse count data. Our goal is to develop a descriptive tensor factorization model of such data, along with appropriate algorithms and theory. To do so, we propose that the random variation is best described via a Poisso…
▽ More
Tensors have found application in a variety of fields, ranging from chemometrics to signal processing and beyond. In this paper, we consider the problem of multilinear modeling of sparse count data. Our goal is to develop a descriptive tensor factorization model of such data, along with appropriate algorithms and theory. To do so, we propose that the random variation is best described via a Poisson distribution, which better describes the zeros observed in the data as compared to the typical assumption of a Gaussian distribution. Under a Poisson assumption, we fit a model to observed data using the negative log-likelihood score. We present a new algorithm for Poisson tensor factorization called CANDECOMP-PARAFAC Alternating Poisson Regression (CP-APR) that is based on a majorization-minimization approach. It can be shown that CP-APR is a generalization of the Lee-Seung multiplicative updates. We show how to prevent the algorithm from converging to non-KKT points and prove convergence of CP-APR under mild conditions. We also explain how to implement CP-APR for large-scale sparse tensors and present results on several data sets, both real and simulated.
△ Less
Submitted 14 August, 2012; v1 submitted 11 December, 2011;
originally announced December 2011.
-
All-at-once Optimization for Coupled Matrix and Tensor Factorizations
Authors:
Evrim Acar,
Tamara G. Kolda,
Daniel M. Dunlavy
Abstract:
Joint analysis of data from multiple sources has the potential to improve our understanding of the underlying structures in complex data sets. For instance, in restaurant recommendation systems, recommendations can be based on rating histories of customers. In addition to rating histories, customers' social networks (e.g., Facebook friendships) and restaurant categories information (e.g., Thai or…
▽ More
Joint analysis of data from multiple sources has the potential to improve our understanding of the underlying structures in complex data sets. For instance, in restaurant recommendation systems, recommendations can be based on rating histories of customers. In addition to rating histories, customers' social networks (e.g., Facebook friendships) and restaurant categories information (e.g., Thai or Italian) can also be used to make better recommendations. The task of fusing data, however, is challenging since data sets can be incomplete and heterogeneous, i.e., data consist of both matrices, e.g., the person by person social network matrix or the restaurant by category matrix, and higher-order tensors, e.g., the "ratings" tensor of the form restaurant by meal by person.
In this paper, we are particularly interested in fusing data sets with the goal of capturing their underlying latent structures. We formulate this problem as a coupled matrix and tensor factorization (CMTF) problem where heterogeneous data sets are modeled by fitting outer-product models to higher-order tensors and matrices in a coupled manner. Unlike traditional approaches solving this problem using alternating algorithms, we propose an all-at-once optimization approach called CMTF-OPT (CMTF-OPTimization), which is a gradient-based optimization approach for joint analysis of matrices and higher-order tensors. We also extend the algorithm to handle coupled incomplete data sets. Using numerical experiments, we demonstrate that the proposed all-at-once approach is more accurate than the alternating least squares approach.
△ Less
Submitted 17 May, 2011;
originally announced May 2011.
-
Making Tensor Factorizations Robust to Non-Gaussian Noise
Authors:
Eric C. Chi,
Tamara G. Kolda
Abstract:
Tensors are multi-way arrays, and the Candecomp/Parafac (CP) tensor factorization has found application in many different domains. The CP model is typically fit using a least squares objective function, which is a maximum likelihood estimate under the assumption of i.i.d. Gaussian noise. We demonstrate that this loss function can actually be highly sensitive to non-Gaussian noise. Therefore, we pr…
▽ More
Tensors are multi-way arrays, and the Candecomp/Parafac (CP) tensor factorization has found application in many different domains. The CP model is typically fit using a least squares objective function, which is a maximum likelihood estimate under the assumption of i.i.d. Gaussian noise. We demonstrate that this loss function can actually be highly sensitive to non-Gaussian noise. Therefore, we propose a loss function based on the 1-norm because it can accommodate both Gaussian and grossly non-Gaussian perturbations. We also present an alternating majorization-minimization algorithm for fitting a CP model using our proposed loss function.
△ Less
Submitted 14 October, 2010;
originally announced October 2010.
-
Shifted Power Method for Computing Tensor Eigenpairs
Authors:
Tamara G. Kolda,
Jackson R. Mayo
Abstract:
Recent work on eigenvalues and eigenvectors for tensors of order m >= 3 has been motivated by applications in blind source separation, magnetic resonance imaging, molecular conformation, and more. In this paper, we consider methods for computing real symmetric-tensor eigenpairs of the form Ax^{m-1} = λx subject to ||x||=1, which is closely related to optimal rank-1 approximation of a symmetric ten…
▽ More
Recent work on eigenvalues and eigenvectors for tensors of order m >= 3 has been motivated by applications in blind source separation, magnetic resonance imaging, molecular conformation, and more. In this paper, we consider methods for computing real symmetric-tensor eigenpairs of the form Ax^{m-1} = λx subject to ||x||=1, which is closely related to optimal rank-1 approximation of a symmetric tensor. Our contribution is a shifted symmetric higher-order power method (SS-HOPM), which we show is guaranteed to converge to a tensor eigenpair. SS-HOPM can be viewed as a generalization of the power iteration method for matrices or of the symmetric higher-order power method. Additionally, using fixed point analysis, we can characterize exactly which eigenpairs can and cannot be found by the method. Numerical examples are presented, including examples from an extension of the method to finding complex eigenpairs.
△ Less
Submitted 22 February, 2011; v1 submitted 7 July, 2010;
originally announced July 2010.
-
Temporal Link Prediction using Matrix and Tensor Factorizations
Authors:
Daniel M. Dunlavy,
Tamara G. Kolda,
Evrim Acar
Abstract:
The data in many disciplines such as social networks, web analysis, etc. is link-based, and the link structure can be exploited for many different data mining tasks. In this paper, we consider the problem of temporal link prediction: Given link data for times 1 through T, can we predict the links at time T+1? If our data has underlying periodic structure, can we predict out even further in time, i…
▽ More
The data in many disciplines such as social networks, web analysis, etc. is link-based, and the link structure can be exploited for many different data mining tasks. In this paper, we consider the problem of temporal link prediction: Given link data for times 1 through T, can we predict the links at time T+1? If our data has underlying periodic structure, can we predict out even further in time, i.e., links at time T+2, T+3, etc.? In this paper, we consider bipartite graphs that evolve over time and consider matrix- and tensor-based methods for predicting future links. We present a weight-based method for collapsing multi-year data into a single matrix. We show how the well-known Katz method for link prediction can be extended to bipartite graphs and, moreover, approximated in a scalable way using a truncated singular value decomposition. Using a CANDECOMP/PARAFAC tensor decomposition of the data, we illustrate the usefulness of exploiting the natural three-dimensional structure of temporal link data. Through several numerical experiments, we demonstrate that both matrix- and tensor-based techniques are effective for temporal link prediction despite the inherent difficulty of the problem. Additionally, we show that tensor-based techniques are particularly effective for temporal data with varying periodic patterns.
△ Less
Submitted 19 June, 2010; v1 submitted 21 May, 2010;
originally announced May 2010.
-
Scalable Tensor Factorizations for Incomplete Data
Authors:
Evrim Acar,
Tamara G. Kolda,
Daniel M. Dunlavy,
Morten Morup
Abstract:
The problem of incomplete data - i.e., data with missing or unknown values - in multi-way arrays is ubiquitous in biomedical signal processing, network traffic analysis, bibliometrics, social network analysis, chemometrics, computer vision, communication networks, etc. We consider the problem of how to factorize data sets with missing values with the goal of capturing the underlying latent struc…
▽ More
The problem of incomplete data - i.e., data with missing or unknown values - in multi-way arrays is ubiquitous in biomedical signal processing, network traffic analysis, bibliometrics, social network analysis, chemometrics, computer vision, communication networks, etc. We consider the problem of how to factorize data sets with missing values with the goal of capturing the underlying latent structure of the data and possibly reconstructing missing values (i.e., tensor completion). We focus on one of the most well-known tensor factorizations that captures multi-linear structure, CANDECOMP/PARAFAC (CP). In the presence of missing data, CP can be formulated as a weighted least squares problem that models only the known entries. We develop an algorithm called CP-WOPT (CP Weighted OPTimization) that uses a first-order optimization approach to solve the weighted least squares problem. Based on extensive numerical experiments, our algorithm is shown to successfully factorize tensors with noise and up to 99% missing data. A unique aspect of our approach is that it scales to sparse large-scale data, e.g., 1000 x 1000 x 1000 with five million known entries (0.5% dense). We further demonstrate the usefulness of CP-WOPT on two real-world applications: a novel EEG (electroencephalogram) application where missing data is frequently encountered due to disconnections of electrodes and the problem of modeling computer network traffic where data may be absent due to the expense of the data collection process.
△ Less
Submitted 12 May, 2010;
originally announced May 2010.