Search | arXiv e-print repository

Matrices over a Hilbert space and their low-rank approximation

Abstract: Matrices are typically considered over fields or rings. Motivated by applications in parametric differential equations and data-driven modeling, we suggest to study matrices with entries from a Hilbert space and present an elementary theory of them: from basic properties to low-rank approximation. Specifically, we extend the idea of cross approximation to such matrices and propose an analogue of t… ▽ More Matrices are typically considered over fields or rings. Motivated by applications in parametric differential equations and data-driven modeling, we suggest to study matrices with entries from a Hilbert space and present an elementary theory of them: from basic properties to low-rank approximation. Specifically, we extend the idea of cross approximation to such matrices and propose an analogue of the adaptive cross approximation algorithm. Our numerical experiments show that this approach can achieve quasioptimal approximation and be integrated with the existing computational software for partial differential equations. △ Less

Submitted 8 May, 2025; originally announced May 2025.

arXiv:2503.10251 [pdf, other]

Numerical Error Analysis of Large Language Models

Authors: Stanislav Budzinskiy, Wenyi Fang, Longbin Zeng, Philipp Petersen

Abstract: Large language models based on transformer architectures have become integral to state-of-the-art natural language processing applications. However, their training remains computationally expensive and exhibits instabilities, some of which are expected to be caused by finite-precision computations. We provide a theoretical analysis of the impact of round-off errors within the forward pass of a tra… ▽ More Large language models based on transformer architectures have become integral to state-of-the-art natural language processing applications. However, their training remains computationally expensive and exhibits instabilities, some of which are expected to be caused by finite-precision computations. We provide a theoretical analysis of the impact of round-off errors within the forward pass of a transformer architecture which yields fundamental bounds for these effects. In addition, we conduct a series of numerical experiments which demonstrate the practical relevance of our bounds. Our results yield concrete guidelines for choosing hyperparameters that mitigate round-off errors, leading to more robust and stable inference. △ Less

Submitted 13 March, 2025; originally announced March 2025.

arXiv:2407.03250 [pdf, other]

When big data actually are low-rank, or entrywise approximation of certain function-generated matrices

Authors: Stanislav Budzinskiy

Abstract: The article concerns low-rank approximation of matrices generated by sampling a smooth function of two $m$-dimensional variables. We identify several misconceptions surrounding a claim that, for a specific class of analytic functions, such $n \times n$ matrices admit accurate entrywise approximation of rank that is independent of $m$ and grows as $\log(n)$ -- colloquially known as ''big-data matri… ▽ More The article concerns low-rank approximation of matrices generated by sampling a smooth function of two $m$-dimensional variables. We identify several misconceptions surrounding a claim that, for a specific class of analytic functions, such $n \times n$ matrices admit accurate entrywise approximation of rank that is independent of $m$ and grows as $\log(n)$ -- colloquially known as ''big-data matrices are approximately low-rank''. We provide a theoretical explanation of the numerical results presented in support of this claim, describing three narrower classes of functions for which function-generated matrices can be approximated within an entrywise error of order $\varepsilon$ with rank $\mathcal{O}(\log(n) \varepsilon^{-2} \log(\varepsilon^{-1}))$ that is independent of the dimension $m$: (i) functions of the inner product of the two variables, (ii) functions of the Euclidean distance between the variables, and (iii) shift-invariant positive-definite kernels. We extend our argument to tensor-train approximation of tensors generated with functions of the ''higher-order inner product'' of their multiple variables. We discuss our results in the context of low-rank approximation of (a) growing datasets and (b) attention in transformer neural networks. △ Less

Submitted 15 April, 2025; v1 submitted 3 July, 2024; originally announced July 2024.

Comments: Accepted for publication in SIAM Journal on Mathematics of Data Science

arXiv:2403.11768 [pdf, other]

doi 10.1137/24M1649186

Entrywise tensor-train approximation of large tensors via random embeddings

Authors: Stanislav Budzinskiy

Abstract: The theory of low-rank tensor-train approximation is well understood when the approximation error is measured in the Frobenius norm. The entrywise maximum norm is equally important but is significantly weaker for large tensors, making the estimates obtained via the Frobenius norm and norm equivalence pessimistic or even meaningless. In this article, we derive a direct estimate of the entrywise app… ▽ More The theory of low-rank tensor-train approximation is well understood when the approximation error is measured in the Frobenius norm. The entrywise maximum norm is equally important but is significantly weaker for large tensors, making the estimates obtained via the Frobenius norm and norm equivalence pessimistic or even meaningless. In this article, we derive a direct estimate of the entrywise approximation error that is applicable in some of these cases. The estimate is given in terms of the higher-order generalization of the matrix factorization norm, and its proof is based on the tensor-structured Hanson--Wright inequality. The theoretical results are accompanied by numerical experiments carried out with the method of alternating projections. △ Less

Submitted 31 March, 2025; v1 submitted 18 March, 2024; originally announced March 2024.

Comments: Accepted for publication in SIAM Journal on Matrix Analysis and Applications

Journal ref: SIAM Journal on Matrix Analysis and Applications, 46(2):984-1005, 2025

arXiv:2312.12905 [pdf, other]

doi 10.1016/j.laa.2024.02.012

On the distance to low-rank matrices in the maximum norm

Authors: Stanislav Budzinskiy

Abstract: Every sufficiently big matrix with small spectral norm has a nearby low-rank matrix if the distance is measured in the maximum norm (Udell & Townsend, SIAM J Math Data Sci, 2019). We use the Hanson--Wright inequality to improve the estimate of the distance for matrices with incoherent column and row spaces. In numerical experiments with several classes of matrices we study how well the theoretical… ▽ More Every sufficiently big matrix with small spectral norm has a nearby low-rank matrix if the distance is measured in the maximum norm (Udell & Townsend, SIAM J Math Data Sci, 2019). We use the Hanson--Wright inequality to improve the estimate of the distance for matrices with incoherent column and row spaces. In numerical experiments with several classes of matrices we study how well the theoretical upper bound describes the approximation errors achieved with the method of alternating projections. △ Less

Submitted 19 February, 2024; v1 submitted 20 December, 2023; originally announced December 2023.

Comments: Accepted version

Journal ref: Linear Algebra and its Applications, 688:44-58, 2024

arXiv:2308.16097 [pdf, ps, other]

Quasioptimal alternating projections and their use in low-rank approximation of matrices and tensors

Authors: Stanislav Budzinskiy

Abstract: We study the convergence of specific inexact alternating projections for two non-convex sets in a Euclidean space. The $σ$-quasioptimal metric projection ($σ\geq 1$) of a point $x$ onto a set $A$ consists of points in $A$ the distance to which is at most $σ$ times larger than the minimal distance $\mathrm{dist}(x,A)$. We prove that quasioptimal alternating projections, when one or both projections… ▽ More We study the convergence of specific inexact alternating projections for two non-convex sets in a Euclidean space. The $σ$-quasioptimal metric projection ($σ\geq 1$) of a point $x$ onto a set $A$ consists of points in $A$ the distance to which is at most $σ$ times larger than the minimal distance $\mathrm{dist}(x,A)$. We prove that quasioptimal alternating projections, when one or both projections are quasioptimal, converge locally and linearly for super-regular sets with transversal intersection. The theory is motivated by the successful application of alternating projections to low-rank matrix and tensor approximation. We focus on two problems -- nonnegative low-rank approximation and low-rank approximation in the maximum norm -- and develop fast alternating-projection algorithms for matrices and tensor trains based on cross approximation and acceleration techniques. The numerical experiments confirm that the proposed methods are efficient and suggest that they can be used to regularise various low-rank computational routines. △ Less

Submitted 23 June, 2025; v1 submitted 30 August, 2023; originally announced August 2023.

Comments: Accepted for publication in Numerische Mathematik

arXiv:2209.02060 [pdf, other]

doi 10.1007/s40314-023-02211-2

Low-rank nonnegative tensor approximation via alternating projections and sketching

Authors: Azamat Sultonov, Sergey Matveev, Stanislav Budzinskiy

Abstract: We show how to construct nonnegative low-rank approximations of nonnegative tensors in Tucker and tensor train formats. We use alternating projections between the nonnegative orthant and the set of low-rank tensors, using STHOSVD and TTSVD algorithms, respectively, and further accelerate the alternating projections using randomized sketching. The numerical experiments on both synthetic data and hy… ▽ More We show how to construct nonnegative low-rank approximations of nonnegative tensors in Tucker and tensor train formats. We use alternating projections between the nonnegative orthant and the set of low-rank tensors, using STHOSVD and TTSVD algorithms, respectively, and further accelerate the alternating projections using randomized sketching. The numerical experiments on both synthetic data and hyperspectral images show the decay of the negative elements and that the error of the resulting approximation is close to the initial error obtained with STHOSVD and TTSVD. The proposed method for the Tucker case is superior to the previous ones in terms of computational complexity and decay of negative elements. The tensor train case, to the best of our knowledge, has not been studied before. △ Less

Submitted 24 April, 2023; v1 submitted 5 September, 2022; originally announced September 2022.

Comments: Accepted version

Journal ref: Comp. Appl. Math. 42, 68 (2023)

arXiv:2206.12486 [pdf, other]

doi 10.1134/S1995080223080103

Variational Bayesian inference for CP tensor completion with side information

Authors: Stanislav Budzinskiy, Nikolai Zamarashkin

Abstract: We propose a message passing algorithm, based on variational Bayesian inference, for low-rank tensor completion with automatic rank determination in the canonical polyadic format when additional side information (SI) is given. The SI comes in the form of low-dimensional subspaces the contain the fiber spans of the tensor (columns, rows, tubes, etc.). We validate the regularization properties induc… ▽ More We propose a message passing algorithm, based on variational Bayesian inference, for low-rank tensor completion with automatic rank determination in the canonical polyadic format when additional side information (SI) is given. The SI comes in the form of low-dimensional subspaces the contain the fiber spans of the tensor (columns, rows, tubes, etc.). We validate the regularization properties induced by SI with extensive numerical experiments on synthetic and real-world data and present the results about tensor recovery and rank determination. The results show that the number of samples required for successful completion is significantly reduced in the presence of SI. We also discuss the origin of a bump in the phase transition curves that exists when the dimensionality of SI is comparable with that of the tensor. △ Less

Submitted 29 June, 2022; v1 submitted 24 June, 2022; originally announced June 2022.

Comments: added 1 citation

arXiv:2201.11154 [pdf, other]

doi 10.1515/rnam-2023-0009

Sketching for low-rank nonnegative matrix approximation: Numerical study

Authors: Sergey A. Matveev, Stanislav Budzinskiy

Abstract: We propose new approximate alternating projection methods, based on randomized sketching, for the low-rank nonnegative matrix approximation problem: find a low-rank approximation of a nonnegative matrix that is nonnegative, but whose factors can be arbitrary. We calculate the computational complexities of the proposed methods and evaluate their performance in numerical experiments. The comparison… ▽ More We propose new approximate alternating projection methods, based on randomized sketching, for the low-rank nonnegative matrix approximation problem: find a low-rank approximation of a nonnegative matrix that is nonnegative, but whose factors can be arbitrary. We calculate the computational complexities of the proposed methods and evaluate their performance in numerical experiments. The comparison with the known deterministic alternating projection methods shows that the randomized approaches are faster and exhibit similar convergence properties. △ Less

Submitted 24 April, 2023; v1 submitted 26 January, 2022; originally announced January 2022.

Comments: Accepted version

MSC Class: 65K10; 15A03; 65F30; 15A48; 34C30 ACM Class: G.1.3; G.1.2; F.2.1

Journal ref: Russian Journal of Numerical Analysis and Mathematical Modelling 38, no. 2 (2023): 99-114

arXiv:2110.03975 [pdf, other]

doi 10.1002/nla.2520

Tensor train completion: local recovery guarantees via Riemannian optimization

Authors: Stanislav Budzinskiy, Nikolai Zamarashkin

Abstract: In this work, we estimate the number of randomly selected elements of a tensor that with high probability guarantees local convergence of Riemannian gradient descent for tensor train completion. We derive a new bound for the orthogonal projections onto the tangent spaces based on the harmonic mean of the unfoldings' singular values and introduce a notion of core coherence for tensor trains. We als… ▽ More In this work, we estimate the number of randomly selected elements of a tensor that with high probability guarantees local convergence of Riemannian gradient descent for tensor train completion. We derive a new bound for the orthogonal projections onto the tangent spaces based on the harmonic mean of the unfoldings' singular values and introduce a notion of core coherence for tensor trains. We also extend the results to tensor train completion with auxiliary subspace information and obtain the corresponding local convergence guarantees. △ Less

Submitted 30 August, 2023; v1 submitted 8 October, 2021; originally announced October 2021.

Comments: 1 figure added; Accepted version

Journal ref: Numerical Linear Algebra with Applications, 30(6):e2520, 2023

arXiv:2012.09003 [pdf, other]

doi 10.1103/PhysRevE.103.L040101

Hopf bifurcation in addition-shattering kinetics

Authors: Stanislav S. Budzinskiy, Sergey A. Matveev, Pavel L. Krapivsky

Abstract: In aggregation-fragmentation processes, a steady state is usually reached in the long time limit. This indicates the existence of a fixed point in the underlying system of ordinary differential equations. The next simplest possibility is an asymptotically periodic motion. Never-ending oscillations have not been rigorously established so far, although oscillations have been recently numerically det… ▽ More In aggregation-fragmentation processes, a steady state is usually reached in the long time limit. This indicates the existence of a fixed point in the underlying system of ordinary differential equations. The next simplest possibility is an asymptotically periodic motion. Never-ending oscillations have not been rigorously established so far, although oscillations have been recently numerically detected in a few systems. For a class of addition-shattering processes, we provide convincing numerical evidence for never-ending oscillations in a certain region $\mathcal{U}$ of the parameter space. The processes which we investigate admit a fixed point that becomes unstable when parameters belong to $\mathcal{U}$ and never-ending oscillations effectively emerge through a Hopf bifurcation. △ Less

Submitted 16 December, 2020; originally announced December 2020.

Comments: 5 pages, 6 figures, 4 pages supplementary, 2 figures supplementary

MSC Class: 65L12; 65L15 ACM Class: G.1.7; G.1.3; I.6.6

Journal ref: Phys. Rev. E 103, 040101 (2021)

arXiv:2006.12798 [pdf, other]

Note: low-rank tensor train completion with side information based on Riemannian optimization

Authors: Stanislav Budzinskiy, Nikolai Zamarashkin

Abstract: We consider the low-rank tensor train completion problem when additional side information is available in the form of subspaces that contain the mode-$k$ fiber spans. We propose an algorithm based on Riemannian optimization to solve the problem. Numerical experiments show that the proposed algorithm requires far fewer known entries to recover the tensor compared to standard tensor train completion… ▽ More We consider the low-rank tensor train completion problem when additional side information is available in the form of subspaces that contain the mode-$k$ fiber spans. We propose an algorithm based on Riemannian optimization to solve the problem. Numerical experiments show that the proposed algorithm requires far fewer known entries to recover the tensor compared to standard tensor train completion methods. △ Less

Submitted 23 June, 2020; originally announced June 2020.

arXiv:1909.00293 [pdf, ps, other]

Zeros of Bessel cross-products coming from oblique derivative boundary value problems

Authors: Stanislav Budzinskiy

Abstract: The paper is devoted to (combinations of) Bessel cross-products that arise from oblique derivative boundary value problems for the Laplacian in a circular annulus. We show that like their Neumann-Laplacian counterpart (and unlike the Dirichlet-Laplacian), they possess two kinds of zeros: those that can be derived by McMahon series and diverge to infinity in the limit, and exceptional ones that rem… ▽ More The paper is devoted to (combinations of) Bessel cross-products that arise from oblique derivative boundary value problems for the Laplacian in a circular annulus. We show that like their Neumann-Laplacian counterpart (and unlike the Dirichlet-Laplacian), they possess two kinds of zeros: those that can be derived by McMahon series and diverge to infinity in the limit, and exceptional ones that remain finite. For both cases we find asymptotic expressions for a fixed oblique angle and vanishing thickness of the annulus. We further present plots of numerically computed zeros and discuss their behaviour when the oblique angle changes and the thickness remains fixed. △ Less

Submitted 31 August, 2019; originally announced September 2019.

Comments: 11 pages, 3 figures

Showing 1–13 of 13 results for author: Budzinskiy, S