-
Stochastic Rounding 2.0, with a View towards Complexity Analysis
Authors:
Petros Drineas,
Ilse C. F. Ipsen
Abstract:
Stochastic Rounding is a probabilistic rounding mode that is surprisingly effective in large-scale computations and low-precision arithmetic. Its random nature promotes error cancellation rather than error accumulation, resulting in slower growth of roundoff errors as the problem size increases, especially when compared to traditional deterministic rounding methods, such as rounding-to-nearest. We…
▽ More
Stochastic Rounding is a probabilistic rounding mode that is surprisingly effective in large-scale computations and low-precision arithmetic. Its random nature promotes error cancellation rather than error accumulation, resulting in slower growth of roundoff errors as the problem size increases, especially when compared to traditional deterministic rounding methods, such as rounding-to-nearest. We advocate for SR as a foundational tool in the complexity analysis of algorithms, and suggest several research directions.
△ Less
Submitted 14 October, 2024;
originally announced October 2024.
-
Stable Rank and Intrinsic Dimension of Real and Complex Matrices
Authors:
Ilse C. F. Ipsen,
Arvind K. Saibaba
Abstract:
The notion of `stable rank' of a matrix is central to the analysis of randomized matrix algorithms, covariance estimation, deep neural networks, and recommender systems. We compare the properties of the stable rank and intrinsic dimension of real and complex matrices to those of the classical rank. Basic proofs and examples illustrate that the stable rank does not satisfy any of the fundamental ra…
▽ More
The notion of `stable rank' of a matrix is central to the analysis of randomized matrix algorithms, covariance estimation, deep neural networks, and recommender systems. We compare the properties of the stable rank and intrinsic dimension of real and complex matrices to those of the classical rank. Basic proofs and examples illustrate that the stable rank does not satisfy any of the fundamental rank properties, while the intrinsic dimension satisfies a few. In particular, the stable rank and intrinsic dimension of a submatrix can exceed those of the original matrix; adding a Hermitian positive semi-definite matrix can lower the intrinsic dimension of the sum; and multiplication by a nonsingular matrix can drastically change the stable rank and the intrinsic dimension. We generalize the concept of stable rank to the p-stable in any Schatten p-norm, thereby unifying the concepts of stable rank and intrinsic dimension: The stable rank is the 2-stable rank, while the intrinsic dimension is the 1-stable rank of a Hermitian positive semi-definite matrix. We derive sum and product inequalities for the pth root of the p-stable rank, and show that it is well-conditioned in the norm-wise absolute sense. The conditioning improves if the matrix and the perturbation are Hermitian positive semi-definite.
△ Less
Submitted 19 December, 2024; v1 submitted 31 July, 2024;
originally announced July 2024.
-
A randomized preconditioned Cholesky-QR algorithm
Authors:
James E. Garrison,
Ilse C. F. Ipsen
Abstract:
We a present and analyze rpCholesky-QR, a randomized preconditioned Cholesky-QR algorithm for computing the thin QR factorization of real mxn matrices with rank n. rpCholesky-QR has a low orthogonalization error, a residual on the order of machine precision, and does not break down for highly singular matrices. We derive rigorous and interpretable two-norm perturbation bounds for rpCholesky-QR tha…
▽ More
We a present and analyze rpCholesky-QR, a randomized preconditioned Cholesky-QR algorithm for computing the thin QR factorization of real mxn matrices with rank n. rpCholesky-QR has a low orthogonalization error, a residual on the order of machine precision, and does not break down for highly singular matrices. We derive rigorous and interpretable two-norm perturbation bounds for rpCholesky-QR that require a minimum of assumptions. Numerical experiments corroborate the accuracy of rpCholesky-QR for preconditioners sampled from as few as 3n rows, and illustrate that the two-norm deviation from orthonormality increases with only the condition number of the preconditioned matrix, rather than its square -- even if the original matrix is numerically singular.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Stochastic Rounding Implicitly Regularizes Tall-and-Thin Matrices
Authors:
Gregory Dexter,
Christos Boutsikas,
Linkai Ma,
Ilse C. F. Ipsen,
Petros Drineas
Abstract:
Motivated by the popularity of stochastic rounding in the context of machine learning and the training of large-scale deep neural network models, we consider stochastic nearness rounding of real matrices $\mathbf{A}$ with many more rows than columns. We provide novel theoretical evidence, supported by extensive experimental evaluation that, with high probability, the smallest singular value of a s…
▽ More
Motivated by the popularity of stochastic rounding in the context of machine learning and the training of large-scale deep neural network models, we consider stochastic nearness rounding of real matrices $\mathbf{A}$ with many more rows than columns. We provide novel theoretical evidence, supported by extensive experimental evaluation that, with high probability, the smallest singular value of a stochastically rounded matrix is well bounded away from zero -- regardless of how close $\mathbf{A}$ is to being rank deficient and even if $\mathbf{A}$ is rank-deficient. In other words, stochastic rounding \textit{implicitly regularizes} tall and skinny matrices $\mathbf{A}$ so that the rounded version has full column rank. Our proofs leverage powerful results in random matrix theory, and the idea that stochastic rounding errors do not concentrate in low-dimensional column spaces.
△ Less
Submitted 6 December, 2024; v1 submitted 18 March, 2024;
originally announced March 2024.
-
Small singular values can increase in lower precision
Authors:
Christos Boutsikas,
Petros Drineas,
Ilse C. F. Ipsen
Abstract:
We perturb a real matrix $A$ of full column rank, and derive lower bounds for the smallest singular values of the perturbed matrix, in terms of normwise absolute perturbations. Our bounds, which extend existing lower-order expressions, demonstrate the potential increase in the smallest singular values, and represent a qualitative model for the increase in the small singular values after a matrix h…
▽ More
We perturb a real matrix $A$ of full column rank, and derive lower bounds for the smallest singular values of the perturbed matrix, in terms of normwise absolute perturbations. Our bounds, which extend existing lower-order expressions, demonstrate the potential increase in the smallest singular values, and represent a qualitative model for the increase in the small singular values after a matrix has been downcast to a lower arithmetic precision. Numerical experiments confirm the qualitative validity of this model and its ability to predict singular values changes in the presence of decreased arithmetic precision.
△ Less
Submitted 20 February, 2024; v1 submitted 6 March, 2023;
originally announced March 2023.
-
Statistical Properties of the Probabilistic Numeric Linear Solver BayesCG
Authors:
Tim W. Reid,
Ilse C. F. Ipsen,
Jon Cockayne,
Chris J. Oates
Abstract:
We analyse the calibration of BayesCG under the Krylov prior, a probabilistic numeric extension of the Conjugate Gradient (CG) method for solving systems of linear equations with symmetric positive definite coefficient matrix. Calibration refers to the statistical quality of the posterior covariances produced by a solver. Since BayesCG is not calibrated in the strict existing notion, we propose in…
▽ More
We analyse the calibration of BayesCG under the Krylov prior, a probabilistic numeric extension of the Conjugate Gradient (CG) method for solving systems of linear equations with symmetric positive definite coefficient matrix. Calibration refers to the statistical quality of the posterior covariances produced by a solver. Since BayesCG is not calibrated in the strict existing notion, we propose instead two test statistics that are necessary but not sufficient for calibration: the Z-statistic and the new S-statistic. We show analytically and experimentally that under low-rank approximate Krylov posteriors, BayesCG exhibits desirable properties of a calibrated solver, is only slightly optimistic, and is computationally competitive with CG.
△ Less
Submitted 7 August, 2022;
originally announced August 2022.
-
Robust Parameter Identifiability Analysis via Column Subset Selection
Authors:
Katherine J. Pearce,
Ilse C. F. Ipsen,
Mansoor A. Haider,
Arvind K. Saibaba,
Ralph C. Smith
Abstract:
We advocate a numerically reliable and accurate approach for practical parameter identifiability analysis: Applying column subset selection (CSS) to the sensitivity matrix, instead of computing an eigenvalue decomposition of the Fischer information matrix. Identifiability analysis via CSS has three advantages: (i) It quantifies reliability of the subsets of parameters selected as identifiable and…
▽ More
We advocate a numerically reliable and accurate approach for practical parameter identifiability analysis: Applying column subset selection (CSS) to the sensitivity matrix, instead of computing an eigenvalue decomposition of the Fischer information matrix. Identifiability analysis via CSS has three advantages: (i) It quantifies reliability of the subsets of parameters selected as identifiable and unidentifiable. (ii) It establishes criteria for comparing the accuracy of different algorithms. (iii) The implementations are numerically more accurate and reliable than eigenvalue methods applied to the Fischer matrix, yet without an increase in computational cost. The effectiveness of the CSS methods is illustrated with extensive numerical experiments on sensitivity matrices from six physical models, as well as on adversarial synthetic matrices. Among the CSS methods, we recommend an implementation based on the strong rank-revealing QR algorithm because of its rigorous accuracy guarantees for both identifiable and non-identifiable parameters.
△ Less
Submitted 9 May, 2022;
originally announced May 2022.
-
Precision-aware Deterministic and Probabilistic Error Bounds for Floating Point Summation
Authors:
Eric Hallman,
Ilse C. F. Ipsen
Abstract:
We analyze the forward error in the floating point summation of real numbers, for computations in low precision or extreme-scale problem dimensions that push the limits of the precision. We present a systematic recurrence for a martingale on a computational tree, which leads to explicit and interpretable bounds without asymptotic big-O terms. Two probability parameters strengthen the precision-awa…
▽ More
We analyze the forward error in the floating point summation of real numbers, for computations in low precision or extreme-scale problem dimensions that push the limits of the precision. We present a systematic recurrence for a martingale on a computational tree, which leads to explicit and interpretable bounds without asymptotic big-O terms. Two probability parameters strengthen the precision-awareness of our bounds: one parameter controls the first order terms in the summation error, while the second one is designed for controlling higher order terms in low precision or extreme-scale problem dimensions. Our systematic approach yields new deterministic and probabilistic error bounds for three classes of mono-precision algorithms: general summation, shifted general summation, and compensated (sequential) summation. Extension of our systematic error analysis to mixed-precision summation algorithms that allow any number of precisions yields the first probabilistic bounds for the mixed-precision FABsum algorithm. Numerical experiments illustrate that the probabilistic bounds are accurate, and that among the three classes of mono-precision algorithms, compensated summation is generally the most accurate. As for mixed precision algorithms, our recommendation is to minimize the magnitude of intermediate partial sums relative to the precision in which they are computed.
△ Less
Submitted 29 March, 2022;
originally announced March 2022.
-
Monte Carlo Methods for Estimating the Diagonal of a Real Symmetric Matrix
Authors:
Eric Hallman,
Ilse C. F. Ipsen,
Arvind Saibaba
Abstract:
For real symmetric matrices that are accessible only through matrix vector products, we present Monte Carlo estimators for computing the diagonal elements. Our probabilistic bounds for normwise absolute and relative errors apply to Monte Carlo estimators based on random Rademacher, sparse Rademacher, normalized and unnormalized Gaussian vectors, and to vectors with bounded fourth moments. The nove…
▽ More
For real symmetric matrices that are accessible only through matrix vector products, we present Monte Carlo estimators for computing the diagonal elements. Our probabilistic bounds for normwise absolute and relative errors apply to Monte Carlo estimators based on random Rademacher, sparse Rademacher, normalized and unnormalized Gaussian vectors, and to vectors with bounded fourth moments. The novel use of matrix concentration inequalities in our proofs represents a systematic model for future analyses. Our bounds mostly do not depend on the matrix dimension, target different error measures than existing work, and imply that the accuracy of the estimators increases with the diagonal dominance of the matrix. An application to derivative-based global sensitivity metrics corroborates this, as do numerical experiments on synthetic test matrices. We recommend against the use in practice of sparse Rademacher vectors, which are the basis for many randomized sketching and sampling algorithms, because they tend to deliver barely a digit of accuracy even under large sampling amounts.
△ Less
Submitted 17 March, 2022; v1 submitted 6 February, 2022;
originally announced February 2022.
-
Deterministic and Probabilistic Error Bounds for Floating Point Summation Algorithms
Authors:
Eric Hallman,
Ilse C. F. Ipsen
Abstract:
We analyse the forward error in the floating point summation of real numbers, from algorithms that do not require recourse to higher precision or better hardware. We derive informative explicit expressions, and new deterministic and probabilistic bounds for errors in three classes of algorithms: general summation,shifted general summation, and compensated (sequential) summation. Our probabilistic…
▽ More
We analyse the forward error in the floating point summation of real numbers, from algorithms that do not require recourse to higher precision or better hardware. We derive informative explicit expressions, and new deterministic and probabilistic bounds for errors in three classes of algorithms: general summation,shifted general summation, and compensated (sequential) summation. Our probabilistic bounds for general and shifted general summation hold to all orders. For compensated summation, we also present deterministic and probabilistic first and second order bounds, with a first order bound that differs from existing ones. Numerical experiments illustrate that the bounds are informative and that among the three algorithm classes, compensated summation is generally the most accurate method.
△ Less
Submitted 4 July, 2021;
originally announced July 2021.
-
Probabilistic Iterative Methods for Linear Systems
Authors:
Jon Cockayne,
Ilse C. F. Ipsen,
Chris J. Oates,
Tim W. Reid
Abstract:
This paper presents a probabilistic perspective on iterative methods for approximating the solution $\mathbf{x}_* \in \mathbb{R}^d$ of a nonsingular linear system $\mathbf{A} \mathbf{x}_* = \mathbf{b}$. In the approach a standard iterative method on $\mathbb{R}^d$ is lifted to act on the space of probability distributions $\mathcal{P}(\mathbb{R}^d)$. Classically, an iterative method produces a seq…
▽ More
This paper presents a probabilistic perspective on iterative methods for approximating the solution $\mathbf{x}_* \in \mathbb{R}^d$ of a nonsingular linear system $\mathbf{A} \mathbf{x}_* = \mathbf{b}$. In the approach a standard iterative method on $\mathbb{R}^d$ is lifted to act on the space of probability distributions $\mathcal{P}(\mathbb{R}^d)$. Classically, an iterative method produces a sequence $\mathbf{x}_m$ of approximations that converge to $\mathbf{x}_*$. The output of the iterative methods proposed in this paper is, instead, a sequence of probability distributions $μ_m \in \mathcal{P}(\mathbb{R}^d)$. The distributional output both provides a "best guess" for $\mathbf{x}_*$, for example as the mean of $μ_m$, and also probabilistic uncertainty quantification for the value of $\mathbf{x}_*$ when it has not been exactly determined. Theoretical analysis is provided in the prototypical case of a stationary linear iterative method. In this setting we characterise both the rate of contraction of $μ_m$ to an atomic measure on $\mathbf{x}_*$ and the nature of the uncertainty quantification being provided. We conclude with an empirical illustration that highlights the insight into solution uncertainty that can be provided by probabilistic iterative methods.
△ Less
Submitted 11 January, 2021; v1 submitted 23 December, 2020;
originally announced December 2020.
-
BayesCG As An Uncertainty Aware Version of CG
Authors:
Tim W. Reid,
Ilse C. F. Ipsen,
Jon Cockayne,
Chris J. Oates
Abstract:
The Bayesian Conjugate Gradient method (BayesCG) is a probabilistic generalization of the Conjugate Gradient method (CG) for solving linear systems with real symmetric positive definite coefficient matrices. Our CG-based implementation of BayesCG under a structure-exploiting prior distribution represents an 'uncertainty-aware' version of CG. Its output consists of CG iterates and posterior covaria…
▽ More
The Bayesian Conjugate Gradient method (BayesCG) is a probabilistic generalization of the Conjugate Gradient method (CG) for solving linear systems with real symmetric positive definite coefficient matrices. Our CG-based implementation of BayesCG under a structure-exploiting prior distribution represents an 'uncertainty-aware' version of CG. Its output consists of CG iterates and posterior covariances that can be propagated to subsequent computations. The covariances have low-rank and are maintained in factored form. This allows easy generation of accurate samples to probe uncertainty in downstream computations. Numerical experiments confirm the effectiveness of the low-rank posterior covariances.
△ Less
Submitted 3 October, 2022; v1 submitted 7 August, 2020;
originally announced August 2020.
-
Multiplicative Perturbation Bounds for Multivariate Multiple Linear Regression in Schatten $p$-Norms
Authors:
Jocelyn T. Chi,
Ilse C. F. Ipsen
Abstract:
Multivariate multiple linear regression (MMLR), which occurs in a number of practical applications, generalizes traditional least squares (multivariate linear regression) to multiple right-hand sides. We extend recent MLR analyses to sketched MMLR in general Schatten $p$-norms by interpreting the sketched problem as a multiplicative perturbation. Our work represents an extension of Maher's results…
▽ More
Multivariate multiple linear regression (MMLR), which occurs in a number of practical applications, generalizes traditional least squares (multivariate linear regression) to multiple right-hand sides. We extend recent MLR analyses to sketched MMLR in general Schatten $p$-norms by interpreting the sketched problem as a multiplicative perturbation. Our work represents an extension of Maher's results on Schatten $p$-norms. We derive expressions for the exact and perturbed solutions in terms of projectors for easy geometric interpretation. We also present a geometric interpretation of the action of the sketching matrix in terms of relevant subspaces. We show that a key term in assessing the accuracy of the sketched MMLR solution can be viewed as a tangent of a largest principal angle between subspaces under some assumptions. Our results enable additional interpretation of the difference between an orthogonal and oblique projector with the same range.
△ Less
Submitted 12 July, 2020;
originally announced July 2020.
-
Probabilistic Error Analysis for Inner Products
Authors:
Ilse C. F. Ipsen,
Hua Zhou
Abstract:
Probabilistic models are proposed for bounding the forward error in the numerically computed inner product (dot product, scalar product) between of two real $n$-vectors. We derive probabilistic perturbation bounds, as well as probabilistic roundoff error bounds for the sequential accumulation of the inner product. These bounds are non-asymptotic, explicit, and make minimal assumptions on perturbat…
▽ More
Probabilistic models are proposed for bounding the forward error in the numerically computed inner product (dot product, scalar product) between of two real $n$-vectors. We derive probabilistic perturbation bounds, as well as probabilistic roundoff error bounds for the sequential accumulation of the inner product. These bounds are non-asymptotic, explicit, and make minimal assumptions on perturbations and roundoffs.
The perturbations are represented as independent, bounded, zero-mean random variables, and the probabilistic perturbation bound is based on Azuma's inequality. The roundoffs are also represented as bounded, zero-mean random variables. The first probabilistic bound assumes that the roundoffs are independent, while the second one does not. For the latter, we construct a Martingale that mirrors the sequential order of computations.
Numerical experiments confirm that our bounds are more informative, often by several orders of magnitude, than traditional deterministic bounds -- even for small vector dimensions~$n$ and very stringent success probabilities. In particular the probabilistic roundoff error bounds are functions of $\sqrt{n}$ rather than~$n$, thus giving a quantitative confirmation of Wilkinson's intuition. The paper concludes with a critical assessment of the probabilistic approach.
△ Less
Submitted 25 June, 2019;
originally announced June 2019.
-
Probabilistic Linear Solvers: A Unifying View
Authors:
Simon Bartels,
Jon Cockayne,
Ilse C. F. Ipsen,
Philipp Hennig
Abstract:
Several recent works have developed a new, probabilistic interpretation for numerical algorithms solving linear systems in which the solution is inferred in a Bayesian framework, either directly or by inferring the unknown action of the matrix inverse. These approaches have typically focused on replicating the behavior of the conjugate gradient method as a prototypical iterative method. In this wo…
▽ More
Several recent works have developed a new, probabilistic interpretation for numerical algorithms solving linear systems in which the solution is inferred in a Bayesian framework, either directly or by inferring the unknown action of the matrix inverse. These approaches have typically focused on replicating the behavior of the conjugate gradient method as a prototypical iterative method. In this work surprisingly general conditions for equivalence of these disparate methods are presented. We also describe connections between probabilistic linear solvers and projection methods for linear systems, providing a probabilistic interpretation of a far more general class of iterative methods. In particular, this provides such an interpretation of the generalised minimum residual method. A probabilistic view of preconditioning is also introduced. These developments unify the literature on probabilistic linear solvers, and provide foundational connections to the literature on iterative solvers for linear systems.
△ Less
Submitted 17 October, 2018; v1 submitted 8 October, 2018;
originally announced October 2018.
-
A Projector-Based Approach to Quantifying Total and Excess Uncertainties for Sketched Linear Regression
Authors:
Jocelyn T. Chi,
Ilse C. F. Ipsen
Abstract:
Linear regression is a classic method of data analysis. In recent years, sketching -- a method of dimension reduction using random sampling, random projections, or both -- has gained popularity as an effective computational approximation when the number of observations greatly exceeds the number of variables. In this paper, we address the following question: How does sketching affect the statistic…
▽ More
Linear regression is a classic method of data analysis. In recent years, sketching -- a method of dimension reduction using random sampling, random projections, or both -- has gained popularity as an effective computational approximation when the number of observations greatly exceeds the number of variables. In this paper, we address the following question: How does sketching affect the statistical properties of the solution and key quantities derived from it?
To answer this question, we present a projector-based approach to sketched linear regression that is exact and that requires minimal assumptions on the sketching matrix. Therefore, downstream analyses hold exactly and generally for all sketching schemes. Additionally, a projector-based approach enables derivation of key quantities from classic linear regression that account for the combined model- and algorithm-induced uncertainties. We demonstrate the usefulness of a projector-based approach in quantifying and enabling insight on excess uncertainties and bias-variance decompositions for sketched linear regression. Finally, we demonstrate how the insights from our projector-based analyses can be used to produce practical sketching diagnostics to aid the design of judicious sketching schemes.
△ Less
Submitted 3 August, 2020; v1 submitted 17 August, 2018;
originally announced August 2018.
-
A Bayesian Conjugate Gradient Method
Authors:
Jon Cockayne,
Chris Oates,
Ilse Ipsen,
Mark Girolami
Abstract:
A fundamental task in numerical computation is the solution of large linear systems. The conjugate gradient method is an iterative method which offers rapid convergence to the solution, particularly when an effective preconditioner is employed. However, for more challenging systems a substantial error can be present even after many iterations have been performed. The estimates obtained in this cas…
▽ More
A fundamental task in numerical computation is the solution of large linear systems. The conjugate gradient method is an iterative method which offers rapid convergence to the solution, particularly when an effective preconditioner is employed. However, for more challenging systems a substantial error can be present even after many iterations have been performed. The estimates obtained in this case are of little value unless further information can be provided about the numerical error. In this paper we propose a novel statistical model for this numerical error set in a Bayesian framework. Our approach is a strict generalisation of the conjugate gradient method, which is recovered as the posterior mean for a particular choice of prior. The estimates obtained are analysed with Krylov subspace methods and a contraction result for the posterior is presented. The method is then analysed in a simulation study as well as being applied to a challenging problem in medical imaging.
△ Less
Submitted 17 December, 2018; v1 submitted 16 January, 2018;
originally announced January 2018.
-
A Probabilistic Subspace Bound with Application to Active Subspaces
Authors:
John T. Holodnak,
Ilse C. F. Ipsen,
Ralph C. Smith
Abstract:
Given a real symmetric positive semi-definite matrix E, and an approximation S that is a sum of n independent matrix-valued random variables, we present bounds on the relative error in S due to randomization. The bounds do not depend on the matrix dimensions but only on the numerical rank (intrinsic dimension) of E. Our approach resembles the low-rank approximation of kernel matrices from random f…
▽ More
Given a real symmetric positive semi-definite matrix E, and an approximation S that is a sum of n independent matrix-valued random variables, we present bounds on the relative error in S due to randomization. The bounds do not depend on the matrix dimensions but only on the numerical rank (intrinsic dimension) of E. Our approach resembles the low-rank approximation of kernel matrices from random features, but our accuracy measures are more stringent.
In the context of parameter selection based on active subspaces, where S is computed via Monte Carlo sampling, we present a bound on the number of samples so that with high probability the angle between the dominant subspaces of E and S is less than a user-specified tolerance. This is a substantial improvement over existing work, as it is a non-asymptotic and fully explicit bound on the sampling amount n, and it allows the user to tune the success probability. It also suggests that Monte Carlo sampling can be efficient in the presence of many parameters, as long as the underlying function f is sufficiently smooth.
△ Less
Submitted 2 January, 2018;
originally announced January 2018.
-
Low-Rank Matrix Approximations Do Not Need a Singular Value Gap
Authors:
Petros Drineas,
Ilse C. F. Ipsen
Abstract:
This is a systematic investigation into the sensitivity of low-rank approximations of real matrices. We show that the low-rank approximation errors, in the two-norm, Frobenius norm and more generally, any Schatten p-norm, are insensitive to additive rank-preserving perturbations in the projector basis; and to matrix perturbations that are additive or change the number of columns (including multipl…
▽ More
This is a systematic investigation into the sensitivity of low-rank approximations of real matrices. We show that the low-rank approximation errors, in the two-norm, Frobenius norm and more generally, any Schatten p-norm, are insensitive to additive rank-preserving perturbations in the projector basis; and to matrix perturbations that are additive or change the number of columns (including multiplicative perturbations). Thus, low-rank matrix approximations are always well-posed and do not require a singular value gap. In the presence of a singular value gap, connections are established between low-rank approximations and subspace angles.
△ Less
Submitted 2 January, 2018;
originally announced January 2018.
-
Eigenvector continuation with subspace learning
Authors:
Dillon Frame,
Rongzheng He,
Ilse Ipsen,
Daniel Lee,
Dean Lee,
Ermal Rrapaj
Abstract:
A common challenge faced in quantum physics is finding the extremal eigenvalues and eigenvectors of a Hamiltonian matrix in a vector space so large that linear algebra operations on general vectors are not possible. There are numerous efficient methods developed for this task, but they generally fail when some control parameter in the Hamiltonian matrix exceeds some threshold value. In this work w…
▽ More
A common challenge faced in quantum physics is finding the extremal eigenvalues and eigenvectors of a Hamiltonian matrix in a vector space so large that linear algebra operations on general vectors are not possible. There are numerous efficient methods developed for this task, but they generally fail when some control parameter in the Hamiltonian matrix exceeds some threshold value. In this work we present a new technique called eigenvector continuation that can extend the reach of these methods. The key insight is that while an eigenvector resides in a linear space with enormous dimensions, the eigenvector trajectory generated by smooth changes of the Hamiltonian matrix is well approximated by a very low-dimensional manifold. We prove this statement using analytic function theory and propose an algorithm to solve for the extremal eigenvectors. We benchmark the method using several examples from quantum many-body theory.
△ Less
Submitted 5 June, 2018; v1 submitted 19 November, 2017;
originally announced November 2017.
-
Structural Convergence Results for Approximation of Dominant Subspaces from Block Krylov Spaces
Authors:
Petros Drineas,
Ilse Ipsen,
Eugenia-Maria Kontopoulou,
Malik Magdon-Ismail
Abstract:
This paper is concerned with approximating the dominant left singular vector space of a real matrix $A$ of arbitrary dimension, from block Krylov spaces generated by the matrix $AA^T$ and the block vector $AX$. Two classes of results are presented. First are bounds on the distance, in the two and Frobenius norms, between the Krylov space and the target space. The distance is expressed in terms of…
▽ More
This paper is concerned with approximating the dominant left singular vector space of a real matrix $A$ of arbitrary dimension, from block Krylov spaces generated by the matrix $AA^T$ and the block vector $AX$. Two classes of results are presented. First are bounds on the distance, in the two and Frobenius norms, between the Krylov space and the target space. The distance is expressed in terms of principal angles. Second are quality of approximation bounds, relative to the best approximation in the Frobenius norm. For starting guesses $X$ of full column-rank, the bounds depend on the tangent of the principal angles between $X$ and the dominant right singular vector space of $A$. The results presented here form the structural foundation for the analysis of randomized Krylov space methods. The innovative feature is a combination of traditional Lanczos convergence analysis with optimal approximations via least squares problems.
△ Less
Submitted 9 May, 2017; v1 submitted 2 September, 2016;
originally announced September 2016.
-
Randomized Matrix-free Trace and Log-Determinant Estimators
Authors:
Arvind K. Saibaba,
Alen Alexanderian,
Ilse C. F. Ipsen
Abstract:
We present randomized algorithms for estimating the trace and deter- minant of Hermitian positive semi-definite matrices. The algorithms are based on subspace iteration, and access the matrix only through matrix vector products. We analyse the error due to randomization, for starting guesses whose elements are Gaussian or Rademacher random variables. The analysis is cleanly separated into a struct…
▽ More
We present randomized algorithms for estimating the trace and deter- minant of Hermitian positive semi-definite matrices. The algorithms are based on subspace iteration, and access the matrix only through matrix vector products. We analyse the error due to randomization, for starting guesses whose elements are Gaussian or Rademacher random variables. The analysis is cleanly separated into a structural (deterministic) part followed by a probabilistic part. Our absolute bounds for the expectation and concentration of the estimators are non-asymptotic and informative even for matrices of low dimension. For the trace estimators, we also present asymptotic bounds on the number of samples (columns of the starting guess) required to achieve a user-specified relative error. Numerical experiments illustrate the performance of the estimators and the tightness of the bounds on low-dimensional matrices; and on a challenging application in uncertainty quantification arising from Bayesian optimal experimental design.
△ Less
Submitted 15 February, 2017; v1 submitted 16 May, 2016;
originally announced May 2016.
-
Conditioning of Leverage Scores and Computation by QR Decomposition
Authors:
John T. Holodnak,
Ilse C. F. Ipsen,
Thomas A. Wentworth
Abstract:
The leverage scores of a full-column rank matrix A are the squared row norms of any orthonormal basis for range(A). We show that corresponding leverage scores of two matrices A and A + ΔA are close in the relative sense, if they have large magnitude and if all principal angles between the column spaces of A and A + ΔA are small. We also show three classes of bounds that are based on perturbation r…
▽ More
The leverage scores of a full-column rank matrix A are the squared row norms of any orthonormal basis for range(A). We show that corresponding leverage scores of two matrices A and A + ΔA are close in the relative sense, if they have large magnitude and if all principal angles between the column spaces of A and A + ΔA are small. We also show three classes of bounds that are based on perturbation results of QR decompositions. They demonstrate that relative differences between individual leverage scores strongly depend on the particular type of perturbation ΔA. The bounds imply that the relative accuracy of an individual leverage score depends on: its magnitude and the two-norm condition of A, if ΔA is a general perturbation; the two-norm condition number of A, if ΔA is a perturbation with the same norm-wise row-scaling as A; (to first order) neither condition number nor leverage score magnitude, if ΔA is a component-wise row-scaled perturbation. Numerical experiments confirm the qualitative and quantitative accuracy of our bounds.
△ Less
Submitted 22 May, 2015; v1 submitted 5 February, 2014;
originally announced February 2014.
-
kappa_SQ: A Matlab package for randomized sampling of matrices with orthonormal columns
Authors:
Thomas Wentworth,
Ilse Ipsen
Abstract:
The kappa_SQ software package is designed to assist researchers working on randomized row sampling. The package contains a collection of Matlab functions along with a GUI that ties them all together and provides a platform for the user to perform experiments. In particular, kappa_SQ is designed to do experiments related to the two-norm condition number of a sampled matrix, $κ(SQ)$, where $S$ is a…
▽ More
The kappa_SQ software package is designed to assist researchers working on randomized row sampling. The package contains a collection of Matlab functions along with a GUI that ties them all together and provides a platform for the user to perform experiments. In particular, kappa_SQ is designed to do experiments related to the two-norm condition number of a sampled matrix, $κ(SQ)$, where $S$ is a row sampling matrix and $Q$ is a tall and skinny matrix with orthonormal columns. Via a simple GUI, kappa_SQ can generate test matrices, perform various types of row sampling, measure $κ(SQ)$, calculate bounds and produce high quality plots of the results. All of the important codes are written in separate Matlab function files in a standard format which makes it easy for a user to either use the codes by themselves or incorporate their own codes into the kappa_SQ package.
△ Less
Submitted 4 February, 2014;
originally announced February 2014.
-
Randomized Approximation of the Gram Matrix: Exact Computation and Probabilistic Bounds
Authors:
John T. Holodnak,
Ilse C. F. Ipsen
Abstract:
Given a real matrix A with n columns, the problem is to approximate the Gram product AA^T by c << n weighted outer products of columns of A. Necessary and sufficient conditions for the exact computation of AA^T (in exact arithmetic) from c >= rank(A) columns depend on the right singular vector matrix of A. For a Monte-Carlo matrix multiplication algorithm by Drineas et al. that samples outer produ…
▽ More
Given a real matrix A with n columns, the problem is to approximate the Gram product AA^T by c << n weighted outer products of columns of A. Necessary and sufficient conditions for the exact computation of AA^T (in exact arithmetic) from c >= rank(A) columns depend on the right singular vector matrix of A. For a Monte-Carlo matrix multiplication algorithm by Drineas et al. that samples outer products, we present probabilistic bounds for the 2-norm relative error due to randomization. The bounds depend on the stable rank or the rank of A, but not on the matrix dimensions. Numerical experiments illustrate that the bounds are informative, even for stringent success probabilities and matrices of small dimension. We also derive bounds for the smallest singular value and the condition number of matrices obtained by sampling rows from orthonormal matrices.
△ Less
Submitted 15 May, 2014; v1 submitted 5 October, 2013;
originally announced October 2013.
-
The Effect of Coherence on Sampling from Matrices with Orthonormal Columns, and Preconditioned Least Squares Problems
Authors:
Ilse C. F. Ipsen,
Thomas Wentworth
Abstract:
Motivated by the least squares solver Blendenpik, we investigate three strategies for uniform sampling of rows from m x n matrices Q with orthonormal columns. The goal is to determine, with high probability, how many rows are required so that the sampled matrices have full rank and are well-conditioned with respect to inversion.
Extensive numerical experiments illustrate that the three sampling…
▽ More
Motivated by the least squares solver Blendenpik, we investigate three strategies for uniform sampling of rows from m x n matrices Q with orthonormal columns. The goal is to determine, with high probability, how many rows are required so that the sampled matrices have full rank and are well-conditioned with respect to inversion.
Extensive numerical experiments illustrate that the three sampling strategies (without replacement, with replacement, and Bernoulli sampling) behave almost identically, for small to moderate amounts of sampling. In particular, sampled matrices of full rank tend to have two-norm condition numbers of at most 10.
We derive a bound on the condition number of the sampled matrices in terms of the coherence μ of Q. This bound applies to all three different sampling strategies; it implies a, not necessarily tight, lower bound of O(m μ ln(n)) for the number of sampled rows; and it is realistic and informative even for matrices of small dimension and the stringent requirement of a 99 percent success probability.
For uniform sampling with replacement we derive a potentially tighter condition number bound in terms of the leverage scores of Q. To obtain a more easily computable version of this bound, in terms of just the largest leverage scores, we first derive a general bound on the two-norm of diagonally scaled matrices.
To facilitate the numerical experiments and test the tightness of the bounds, we present algorithms to generate matrices with user-specified coherence and leverage scores. These algorithms, the three sampling strategies, and a large variety of condition number bounds are implemented in the Matlab toolbox kappa_SQ_v3.
△ Less
Submitted 4 March, 2014; v1 submitted 21 March, 2012;
originally announced March 2012.
-
Determinant Approximations
Authors:
Ilse C. F. Ipsen,
Dean J. Lee
Abstract:
A sequence of approximations for the determinant and its logarithm of a complex matrixis derived, along with relative error bounds. The determinant approximations are derived from expansions of det(X)=exp(trace(log(X))), and they apply to non-Hermitian matrices. Examples illustrate that these determinant approximations are efficient for lattice simulations of finite temperature nuclear matter, and…
▽ More
A sequence of approximations for the determinant and its logarithm of a complex matrixis derived, along with relative error bounds. The determinant approximations are derived from expansions of det(X)=exp(trace(log(X))), and they apply to non-Hermitian matrices. Examples illustrate that these determinant approximations are efficient for lattice simulations of finite temperature nuclear matter, and that they use significantly less space than Gaussian elimination. The first approximation in the sequence is a block diagonal approximation; it represents an extension of Fischer's and Hadamard's inequalities to non-Hermitian matrices. In the special case of Hermitian positive-definite matrices, block diagonal approximations can be competitive with sparse inverse approximations. At last, a different representation of sparse inverse approximations is given and it is shown that their accuracy increases as more matrix elements are included.
△ Less
Submitted 2 May, 2011;
originally announced May 2011.
-
La Budde's Method for Computing Characteristic Polynomials
Authors:
Rizwana Rehman,
Ilse C. F. Ipsen
Abstract:
La Budde's method computes the characteristic polynomial of a real matrix A in two stages: first it applies orthogonal similarity transformations to reduce A to upper Hessenberg form H, and second it computes the characteristic polynomial of H from characteristic polynomials of leading principal submatrices of H. If A is symmetric, then H is symmetric tridiagonal, and La Budde's method simplifies…
▽ More
La Budde's method computes the characteristic polynomial of a real matrix A in two stages: first it applies orthogonal similarity transformations to reduce A to upper Hessenberg form H, and second it computes the characteristic polynomial of H from characteristic polynomials of leading principal submatrices of H. If A is symmetric, then H is symmetric tridiagonal, and La Budde's method simplifies to the Sturm sequence method. If A is diagonal then La Budde's method reduces to the Summation Algorithm, a Horner-like scheme used by the MATLAB function POLY to compute characteristic polynomials from eigenvalues. We present recursions to compute the individual coefficients of the characteristic polynomial in the second stage of La Budde's method, and derive running error bounds for symmetric and nonsymmetric matrices. We also show that La Budde's method can be more accurate than POLY, especially for indefinite and nonsymmetric matrices A. Unlike POLY, La Budde's method is not affected by illconditioning of eigenvalues, requires only real arithmetic, and allows the computation of individual coefficients.
△ Less
Submitted 19 April, 2011;
originally announced April 2011.
-
Condition Estimates for Pseudo-Arclength Continuation
Authors:
K. I. Dickson,
C. T. Kelley,
I. C. F. Ipsen,
I. G. Kevrekidis
Abstract:
We bound the condition number of the Jacobian in pseudo arclength continuation problems, and we quantify the effect of this condition number on the linear system solution in a Newton GMRES solve.
In pseudo arclength continuation one repeatedly solves systems of nonlinear equations $F(u(s),λ(s))=0$ for a real-valued function $u$ and a real parameter $λ$, given different values of the arclength…
▽ More
We bound the condition number of the Jacobian in pseudo arclength continuation problems, and we quantify the effect of this condition number on the linear system solution in a Newton GMRES solve.
In pseudo arclength continuation one repeatedly solves systems of nonlinear equations $F(u(s),λ(s))=0$ for a real-valued function $u$ and a real parameter $λ$, given different values of the arclength $s$. It is known that the Jacobian $F_x$ of $F$ with respect to $x=(u,λ)$ is nonsingular, if the path contains only regular points and simple fold singularities. We introduce a new characterization of simple folds in terms of the singular value decomposition, and we use it to derive a new bound for the norm of $F_x^{-1}$. We also show that the convergence rate of GMRES in a Newton step for $F(u(s),λ(s))=0$ is essentially the same as that of the original problem $G(u,λ)=0$. In particular we prove that the bounds on the degrees of the minimal polynomials of the Jacobians $F_x$ and $G_u$ differ by at most 2. We illustrate the effectiveness of our bounds with an example from radiative transfer theory.
△ Less
Submitted 30 March, 2006;
originally announced March 2006.