-
Matrices over a Hilbert space and their low-rank approximation
Authors:
Stanislav Budzinskiy
Abstract:
Matrices are typically considered over fields or rings. Motivated by applications in parametric differential equations and data-driven modeling, we suggest to study matrices with entries from a Hilbert space and present an elementary theory of them: from basic properties to low-rank approximation. Specifically, we extend the idea of cross approximation to such matrices and propose an analogue of t…
▽ More
Matrices are typically considered over fields or rings. Motivated by applications in parametric differential equations and data-driven modeling, we suggest to study matrices with entries from a Hilbert space and present an elementary theory of them: from basic properties to low-rank approximation. Specifically, we extend the idea of cross approximation to such matrices and propose an analogue of the adaptive cross approximation algorithm. Our numerical experiments show that this approach can achieve quasioptimal approximation and be integrated with the existing computational software for partial differential equations.
△ Less
Submitted 8 May, 2025;
originally announced May 2025.
-
Numerical Error Analysis of Large Language Models
Authors:
Stanislav Budzinskiy,
Wenyi Fang,
Longbin Zeng,
Philipp Petersen
Abstract:
Large language models based on transformer architectures have become integral to state-of-the-art natural language processing applications. However, their training remains computationally expensive and exhibits instabilities, some of which are expected to be caused by finite-precision computations. We provide a theoretical analysis of the impact of round-off errors within the forward pass of a tra…
▽ More
Large language models based on transformer architectures have become integral to state-of-the-art natural language processing applications. However, their training remains computationally expensive and exhibits instabilities, some of which are expected to be caused by finite-precision computations. We provide a theoretical analysis of the impact of round-off errors within the forward pass of a transformer architecture which yields fundamental bounds for these effects. In addition, we conduct a series of numerical experiments which demonstrate the practical relevance of our bounds. Our results yield concrete guidelines for choosing hyperparameters that mitigate round-off errors, leading to more robust and stable inference.
△ Less
Submitted 13 March, 2025;
originally announced March 2025.
-
When big data actually are low-rank, or entrywise approximation of certain function-generated matrices
Authors:
Stanislav Budzinskiy
Abstract:
The article concerns low-rank approximation of matrices generated by sampling a smooth function of two $m$-dimensional variables. We identify several misconceptions surrounding a claim that, for a specific class of analytic functions, such $n \times n$ matrices admit accurate entrywise approximation of rank that is independent of $m$ and grows as $\log(n)$ -- colloquially known as ''big-data matri…
▽ More
The article concerns low-rank approximation of matrices generated by sampling a smooth function of two $m$-dimensional variables. We identify several misconceptions surrounding a claim that, for a specific class of analytic functions, such $n \times n$ matrices admit accurate entrywise approximation of rank that is independent of $m$ and grows as $\log(n)$ -- colloquially known as ''big-data matrices are approximately low-rank''. We provide a theoretical explanation of the numerical results presented in support of this claim, describing three narrower classes of functions for which function-generated matrices can be approximated within an entrywise error of order $\varepsilon$ with rank $\mathcal{O}(\log(n) \varepsilon^{-2} \log(\varepsilon^{-1}))$ that is independent of the dimension $m$: (i) functions of the inner product of the two variables, (ii) functions of the Euclidean distance between the variables, and (iii) shift-invariant positive-definite kernels. We extend our argument to tensor-train approximation of tensors generated with functions of the ''higher-order inner product'' of their multiple variables. We discuss our results in the context of low-rank approximation of (a) growing datasets and (b) attention in transformer neural networks.
△ Less
Submitted 15 April, 2025; v1 submitted 3 July, 2024;
originally announced July 2024.
-
Entrywise tensor-train approximation of large tensors via random embeddings
Authors:
Stanislav Budzinskiy
Abstract:
The theory of low-rank tensor-train approximation is well understood when the approximation error is measured in the Frobenius norm. The entrywise maximum norm is equally important but is significantly weaker for large tensors, making the estimates obtained via the Frobenius norm and norm equivalence pessimistic or even meaningless. In this article, we derive a direct estimate of the entrywise app…
▽ More
The theory of low-rank tensor-train approximation is well understood when the approximation error is measured in the Frobenius norm. The entrywise maximum norm is equally important but is significantly weaker for large tensors, making the estimates obtained via the Frobenius norm and norm equivalence pessimistic or even meaningless. In this article, we derive a direct estimate of the entrywise approximation error that is applicable in some of these cases. The estimate is given in terms of the higher-order generalization of the matrix factorization norm, and its proof is based on the tensor-structured Hanson--Wright inequality. The theoretical results are accompanied by numerical experiments carried out with the method of alternating projections.
△ Less
Submitted 31 March, 2025; v1 submitted 18 March, 2024;
originally announced March 2024.
-
On the distance to low-rank matrices in the maximum norm
Authors:
Stanislav Budzinskiy
Abstract:
Every sufficiently big matrix with small spectral norm has a nearby low-rank matrix if the distance is measured in the maximum norm (Udell & Townsend, SIAM J Math Data Sci, 2019). We use the Hanson--Wright inequality to improve the estimate of the distance for matrices with incoherent column and row spaces. In numerical experiments with several classes of matrices we study how well the theoretical…
▽ More
Every sufficiently big matrix with small spectral norm has a nearby low-rank matrix if the distance is measured in the maximum norm (Udell & Townsend, SIAM J Math Data Sci, 2019). We use the Hanson--Wright inequality to improve the estimate of the distance for matrices with incoherent column and row spaces. In numerical experiments with several classes of matrices we study how well the theoretical upper bound describes the approximation errors achieved with the method of alternating projections.
△ Less
Submitted 19 February, 2024; v1 submitted 20 December, 2023;
originally announced December 2023.
-
Quasioptimal alternating projections and their use in low-rank approximation of matrices and tensors
Authors:
Stanislav Budzinskiy
Abstract:
We study the convergence of specific inexact alternating projections for two non-convex sets in a Euclidean space. The $σ$-quasioptimal metric projection ($σ\geq 1$) of a point $x$ onto a set $A$ consists of points in $A$ the distance to which is at most $σ$ times larger than the minimal distance $\mathrm{dist}(x,A)$. We prove that quasioptimal alternating projections, when one or both projections…
▽ More
We study the convergence of specific inexact alternating projections for two non-convex sets in a Euclidean space. The $σ$-quasioptimal metric projection ($σ\geq 1$) of a point $x$ onto a set $A$ consists of points in $A$ the distance to which is at most $σ$ times larger than the minimal distance $\mathrm{dist}(x,A)$. We prove that quasioptimal alternating projections, when one or both projections are quasioptimal, converge locally and linearly for super-regular sets with transversal intersection. The theory is motivated by the successful application of alternating projections to low-rank matrix and tensor approximation. We focus on two problems -- nonnegative low-rank approximation and low-rank approximation in the maximum norm -- and develop fast alternating-projection algorithms for matrices and tensor trains based on cross approximation and acceleration techniques. The numerical experiments confirm that the proposed methods are efficient and suggest that they can be used to regularise various low-rank computational routines.
△ Less
Submitted 23 June, 2025; v1 submitted 30 August, 2023;
originally announced August 2023.
-
Low-rank nonnegative tensor approximation via alternating projections and sketching
Authors:
Azamat Sultonov,
Sergey Matveev,
Stanislav Budzinskiy
Abstract:
We show how to construct nonnegative low-rank approximations of nonnegative tensors in Tucker and tensor train formats. We use alternating projections between the nonnegative orthant and the set of low-rank tensors, using STHOSVD and TTSVD algorithms, respectively, and further accelerate the alternating projections using randomized sketching. The numerical experiments on both synthetic data and hy…
▽ More
We show how to construct nonnegative low-rank approximations of nonnegative tensors in Tucker and tensor train formats. We use alternating projections between the nonnegative orthant and the set of low-rank tensors, using STHOSVD and TTSVD algorithms, respectively, and further accelerate the alternating projections using randomized sketching. The numerical experiments on both synthetic data and hyperspectral images show the decay of the negative elements and that the error of the resulting approximation is close to the initial error obtained with STHOSVD and TTSVD. The proposed method for the Tucker case is superior to the previous ones in terms of computational complexity and decay of negative elements. The tensor train case, to the best of our knowledge, has not been studied before.
△ Less
Submitted 24 April, 2023; v1 submitted 5 September, 2022;
originally announced September 2022.
-
Variational Bayesian inference for CP tensor completion with side information
Authors:
Stanislav Budzinskiy,
Nikolai Zamarashkin
Abstract:
We propose a message passing algorithm, based on variational Bayesian inference, for low-rank tensor completion with automatic rank determination in the canonical polyadic format when additional side information (SI) is given. The SI comes in the form of low-dimensional subspaces the contain the fiber spans of the tensor (columns, rows, tubes, etc.). We validate the regularization properties induc…
▽ More
We propose a message passing algorithm, based on variational Bayesian inference, for low-rank tensor completion with automatic rank determination in the canonical polyadic format when additional side information (SI) is given. The SI comes in the form of low-dimensional subspaces the contain the fiber spans of the tensor (columns, rows, tubes, etc.). We validate the regularization properties induced by SI with extensive numerical experiments on synthetic and real-world data and present the results about tensor recovery and rank determination. The results show that the number of samples required for successful completion is significantly reduced in the presence of SI. We also discuss the origin of a bump in the phase transition curves that exists when the dimensionality of SI is comparable with that of the tensor.
△ Less
Submitted 29 June, 2022; v1 submitted 24 June, 2022;
originally announced June 2022.
-
Sketching for low-rank nonnegative matrix approximation: Numerical study
Authors:
Sergey A. Matveev,
Stanislav Budzinskiy
Abstract:
We propose new approximate alternating projection methods, based on randomized sketching, for the low-rank nonnegative matrix approximation problem: find a low-rank approximation of a nonnegative matrix that is nonnegative, but whose factors can be arbitrary. We calculate the computational complexities of the proposed methods and evaluate their performance in numerical experiments. The comparison…
▽ More
We propose new approximate alternating projection methods, based on randomized sketching, for the low-rank nonnegative matrix approximation problem: find a low-rank approximation of a nonnegative matrix that is nonnegative, but whose factors can be arbitrary. We calculate the computational complexities of the proposed methods and evaluate their performance in numerical experiments. The comparison with the known deterministic alternating projection methods shows that the randomized approaches are faster and exhibit similar convergence properties.
△ Less
Submitted 24 April, 2023; v1 submitted 26 January, 2022;
originally announced January 2022.
-
Tensor train completion: local recovery guarantees via Riemannian optimization
Authors:
Stanislav Budzinskiy,
Nikolai Zamarashkin
Abstract:
In this work, we estimate the number of randomly selected elements of a tensor that with high probability guarantees local convergence of Riemannian gradient descent for tensor train completion. We derive a new bound for the orthogonal projections onto the tangent spaces based on the harmonic mean of the unfoldings' singular values and introduce a notion of core coherence for tensor trains. We als…
▽ More
In this work, we estimate the number of randomly selected elements of a tensor that with high probability guarantees local convergence of Riemannian gradient descent for tensor train completion. We derive a new bound for the orthogonal projections onto the tangent spaces based on the harmonic mean of the unfoldings' singular values and introduce a notion of core coherence for tensor trains. We also extend the results to tensor train completion with auxiliary subspace information and obtain the corresponding local convergence guarantees.
△ Less
Submitted 30 August, 2023; v1 submitted 8 October, 2021;
originally announced October 2021.
-
Hopf bifurcation in addition-shattering kinetics
Authors:
Stanislav S. Budzinskiy,
Sergey A. Matveev,
Pavel L. Krapivsky
Abstract:
In aggregation-fragmentation processes, a steady state is usually reached in the long time limit. This indicates the existence of a fixed point in the underlying system of ordinary differential equations. The next simplest possibility is an asymptotically periodic motion. Never-ending oscillations have not been rigorously established so far, although oscillations have been recently numerically det…
▽ More
In aggregation-fragmentation processes, a steady state is usually reached in the long time limit. This indicates the existence of a fixed point in the underlying system of ordinary differential equations. The next simplest possibility is an asymptotically periodic motion. Never-ending oscillations have not been rigorously established so far, although oscillations have been recently numerically detected in a few systems. For a class of addition-shattering processes, we provide convincing numerical evidence for never-ending oscillations in a certain region $\mathcal{U}$ of the parameter space. The processes which we investigate admit a fixed point that becomes unstable when parameters belong to $\mathcal{U}$ and never-ending oscillations effectively emerge through a Hopf bifurcation.
△ Less
Submitted 16 December, 2020;
originally announced December 2020.
-
Note: low-rank tensor train completion with side information based on Riemannian optimization
Authors:
Stanislav Budzinskiy,
Nikolai Zamarashkin
Abstract:
We consider the low-rank tensor train completion problem when additional side information is available in the form of subspaces that contain the mode-$k$ fiber spans. We propose an algorithm based on Riemannian optimization to solve the problem. Numerical experiments show that the proposed algorithm requires far fewer known entries to recover the tensor compared to standard tensor train completion…
▽ More
We consider the low-rank tensor train completion problem when additional side information is available in the form of subspaces that contain the mode-$k$ fiber spans. We propose an algorithm based on Riemannian optimization to solve the problem. Numerical experiments show that the proposed algorithm requires far fewer known entries to recover the tensor compared to standard tensor train completion methods.
△ Less
Submitted 23 June, 2020;
originally announced June 2020.
-
Zeros of Bessel cross-products coming from oblique derivative boundary value problems
Authors:
Stanislav Budzinskiy
Abstract:
The paper is devoted to (combinations of) Bessel cross-products that arise from oblique derivative boundary value problems for the Laplacian in a circular annulus. We show that like their Neumann-Laplacian counterpart (and unlike the Dirichlet-Laplacian), they possess two kinds of zeros: those that can be derived by McMahon series and diverge to infinity in the limit, and exceptional ones that rem…
▽ More
The paper is devoted to (combinations of) Bessel cross-products that arise from oblique derivative boundary value problems for the Laplacian in a circular annulus. We show that like their Neumann-Laplacian counterpart (and unlike the Dirichlet-Laplacian), they possess two kinds of zeros: those that can be derived by McMahon series and diverge to infinity in the limit, and exceptional ones that remain finite. For both cases we find asymptotic expressions for a fixed oblique angle and vanishing thickness of the annulus. We further present plots of numerically computed zeros and discuss their behaviour when the oblique angle changes and the thickness remains fixed.
△ Less
Submitted 31 August, 2019;
originally announced September 2019.