-
Randomized strong rank-revealing QR for column subset selection and low-rank matrix approximation
Authors:
Laura Grigori,
Zhipeng Xue
Abstract:
We discuss a randomized strong rank-revealing QR factorization that effectively reveals the spectrum of a matrix $\textbf{M}$. This factorization can be used to address problems such as selecting a subset of the columns of $\textbf{M}$, computing its low-rank approximation, estimating its rank, or approximating its null space. Given a random sketching matrix $\pmbΩ$ that satisfies the $ε$-embeddin…
▽ More
We discuss a randomized strong rank-revealing QR factorization that effectively reveals the spectrum of a matrix $\textbf{M}$. This factorization can be used to address problems such as selecting a subset of the columns of $\textbf{M}$, computing its low-rank approximation, estimating its rank, or approximating its null space. Given a random sketching matrix $\pmbΩ$ that satisfies the $ε$-embedding property for a subspace within the range of $\textbf{M}$, the factorization relies on selecting columns that allow to reveal the spectrum via a deterministic strong rank-revealing QR factorization of $\textbf{M}^{sk} = \pmbΩ\textbf{M}$, the sketch of $\textbf{M}$. We show that this selection leads to a factorization with strong rank-revealing properties, making it suitable for approximating the singular values of $\textbf{M}$.
△ Less
Submitted 24 March, 2025;
originally announced March 2025.
-
Randomized Implicitly Restarted Arnoldi method for the non-symmetric eigenvalue problem
Authors:
Jean-Guillaume de Damas,
Laura Grigori
Abstract:
In this paper, we introduce a randomized algorithm for solving the non-symmetric eigenvalue problem, referred to as randomized Implicitly Restarted Arnoldi (rIRA). This method relies on using a sketch-orthogonal basis during the Arnoldi process while maintaining the Arnoldi relation and exploiting a restarting scheme to focus on a specific part of the spectrum. We analyze this method and show that…
▽ More
In this paper, we introduce a randomized algorithm for solving the non-symmetric eigenvalue problem, referred to as randomized Implicitly Restarted Arnoldi (rIRA). This method relies on using a sketch-orthogonal basis during the Arnoldi process while maintaining the Arnoldi relation and exploiting a restarting scheme to focus on a specific part of the spectrum. We analyze this method and show that it retains useful properties of the Implicitly Restarted Arnoldi (IRA) method, such as restarting without adding errors to the Ritz pairs and implicitly applying polynomial filtering. Experiments are presented to validate the numerical efficiency of the proposed randomized eigenvalue solver.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
Randomized Householder QR
Authors:
Laura Grigori,
Edouard Timsit
Abstract:
This paper introduces a randomized Householder QR factorization (RHQR). This factorization can be used to obtain a well conditioned basis of a vector space and thus can be employed in a variety of applications. The RHQR factorization of the input matrix $W$ is equivalent to the standard Householder QR factorization of matrix $ΨW$, where $Ψ$ is a sketching matrix that can be obtained from any subsp…
▽ More
This paper introduces a randomized Householder QR factorization (RHQR). This factorization can be used to obtain a well conditioned basis of a vector space and thus can be employed in a variety of applications. The RHQR factorization of the input matrix $W$ is equivalent to the standard Householder QR factorization of matrix $ΨW$, where $Ψ$ is a sketching matrix that can be obtained from any subspace embedding technique. For this reason, the RHQR factorization can also be reconstructed from the Householder QR factorization of the sketched problem, yielding a single-synchronization randomized QR factorization (recRHQR). In most contexts, left-looking RHQR requires a single synchronization per iteration, with half the computational cost of Householder QR, and a similar cost to Randomized Gram-Schmidt (RGS) overall. We discuss the usage of RHQR factorization in the Arnoldi process and then in GMRES, showing thus how it can be used in Krylov subspace methods to solve systems of linear equations. Based on Charles Sheffield's connection between Householder QR and Modified Gram-Schmidt (MGS), a BLAS2-RGS is also derived. A finite precision analysis shows that, under mild probabilistic assumptions, the RHQR factorization of the input matrix $W$ inherits the stability of the Householder QR factorization, producing a well-conditioned basis and a columnwise backward stable factorization, all independently of the condition number of the input $W$, and with the accuracy of the sketching step. We study the subsampled randomized Hadamard transform (SRHT) as a very stable sketching technique.
Numerical experiments show that RHQR produces a well conditioned basis whose sketch is numerically orthogonal and an accurate factorization, even for the most difficult inputs and with high-dimensional operations made in half-precision.
△ Less
Submitted 22 May, 2024; v1 submitted 17 May, 2024;
originally announced May 2024.
-
Randomized Numerical Linear Algebra : A Perspective on the Field With an Eye to Software
Authors:
Riley Murray,
James Demmel,
Michael W. Mahoney,
N. Benjamin Erichson,
Maksim Melnichenko,
Osman Asif Malik,
Laura Grigori,
Piotr Luszczek,
Michał Dereziński,
Miles E. Lopes,
Tianyu Liang,
Hengrui Luo,
Jack Dongarra
Abstract:
Randomized numerical linear algebra - RandNLA, for short - concerns the use of randomization as a resource to develop improved algorithms for large-scale linear algebra computations.
The origins of contemporary RandNLA lay in theoretical computer science, where it blossomed from a simple idea: randomization provides an avenue for computing approximate solutions to linear algebra problems more ef…
▽ More
Randomized numerical linear algebra - RandNLA, for short - concerns the use of randomization as a resource to develop improved algorithms for large-scale linear algebra computations.
The origins of contemporary RandNLA lay in theoretical computer science, where it blossomed from a simple idea: randomization provides an avenue for computing approximate solutions to linear algebra problems more efficiently than deterministic algorithms. This idea proved fruitful in the development of scalable algorithms for machine learning and statistical data analysis applications. However, RandNLA's true potential only came into focus upon integration with the fields of numerical analysis and "classical" numerical linear algebra. Through the efforts of many individuals, randomized algorithms have been developed that provide full control over the accuracy of their solutions and that can be every bit as reliable as algorithms that might be found in libraries such as LAPACK. Recent years have even seen the incorporation of certain RandNLA methods into MATLAB, the NAG Library, NVIDIA's cuSOLVER, and SciKit-Learn.
For all its success, we believe that RandNLA has yet to realize its full potential. In particular, we believe the scientific community stands to benefit significantly from suitably defined "RandBLAS" and "RandLAPACK" libraries, to serve as standards conceptually analogous to BLAS and LAPACK. This 200-page monograph represents a step toward defining such standards. In it, we cover topics spanning basic sketching, least squares and optimization, low-rank approximation, full matrix decompositions, leverage score sampling, and sketching data with tensor product structures (among others). Much of the provided pseudo-code has been tested via publicly available MATLAB and Python implementations.
△ Less
Submitted 12 April, 2023; v1 submitted 22 February, 2023;
originally announced February 2023.
-
Randomized Orthogonal Projection Methods for Krylov Subspace Solvers
Authors:
Edouard Timsit,
Laura Grigori,
Oleg Balabanov
Abstract:
Randomized orthogonal projection methods (ROPMs) can be used to speed up the computation of Krylov subspace methods in various contexts. Through a theoretical and numerical investigation, we establish that these methods produce quasi-optimal approximations over the Krylov subspace. Our numerical experiments outline the convergence of ROPMs for all matrices in our test set, with occasional spikes,…
▽ More
Randomized orthogonal projection methods (ROPMs) can be used to speed up the computation of Krylov subspace methods in various contexts. Through a theoretical and numerical investigation, we establish that these methods produce quasi-optimal approximations over the Krylov subspace. Our numerical experiments outline the convergence of ROPMs for all matrices in our test set, with occasional spikes, but overall with a convergence rate similar to that of standard OPMs.
△ Less
Submitted 10 March, 2023; v1 submitted 14 February, 2023;
originally announced February 2023.
-
Fast Exact Leverage Score Sampling from Khatri-Rao Products with Applications to Tensor Decomposition
Authors:
Vivek Bharadwaj,
Osman Asif Malik,
Riley Murray,
Laura Grigori,
Aydin Buluc,
James Demmel
Abstract:
We present a data structure to randomly sample rows from the Khatri-Rao product of several matrices according to the exact distribution of its leverage scores. Our proposed sampler draws each row in time logarithmic in the height of the Khatri-Rao product and quadratic in its column count, with persistent space overhead at most the size of the input matrices. As a result, it tractably draws sample…
▽ More
We present a data structure to randomly sample rows from the Khatri-Rao product of several matrices according to the exact distribution of its leverage scores. Our proposed sampler draws each row in time logarithmic in the height of the Khatri-Rao product and quadratic in its column count, with persistent space overhead at most the size of the input matrices. As a result, it tractably draws samples even when the matrices forming the Khatri-Rao product have tens of millions of rows each. When used to sketch the linear least squares problems arising in CANDECOMP / PARAFAC tensor decomposition, our method achieves lower asymptotic complexity per solve than recent state-of-the-art methods. Experiments on billion-scale sparse tensors validate our claims, with our algorithm achieving higher accuracy than competing methods as the decomposition rank grows.
△ Less
Submitted 28 February, 2024; v1 submitted 29 January, 2023;
originally announced January 2023.
-
Factorized structure of the long-range two-electron integrals tensor and its application in quantum chemistry
Authors:
Siwar Badreddine,
Igor Chollet,
Laura Grigori
Abstract:
We introduce two new approximation methods for the numerical evaluation of the long-range Coulomb potential and the approximation of the resulting high dimensional Two-Electron Integrals tensor (TEI) with long-range interactions arising in molecular simulations. The first method exploits the tensorized structure of the compressed two-electron integrals obtained through two-dimensional Chebyshev in…
▽ More
We introduce two new approximation methods for the numerical evaluation of the long-range Coulomb potential and the approximation of the resulting high dimensional Two-Electron Integrals tensor (TEI) with long-range interactions arising in molecular simulations. The first method exploits the tensorized structure of the compressed two-electron integrals obtained through two-dimensional Chebyshev interpolation combined with Gaussian quadrature. The second method is based on the Fast Multipole Method (FMM). Numerical experiments for different medium size molecules on high quality basis sets outline the efficiency of the two methods. Detailed algorithmic is provided in this paper as well as numerical comparison of the introduced approaches.
△ Less
Submitted 24 October, 2022;
originally announced October 2022.
-
Block subsampled randomized Hadamard transform for low-rank approximation on distributed architectures
Authors:
Oleg Balabanov,
Matthias Beaupere,
Laura Grigori,
Victor Lederer
Abstract:
This article introduces a novel structured random matrix composed blockwise from subsampled randomized Hadamard transforms (SRHTs). The block SRHT is expected to outperform well-known dimension reduction maps, including SRHT and Gaussian matrices, on distributed architectures with not too many cores compared to the dimension. We prove that a block SRHT with enough rows is an oblivious subspace emb…
▽ More
This article introduces a novel structured random matrix composed blockwise from subsampled randomized Hadamard transforms (SRHTs). The block SRHT is expected to outperform well-known dimension reduction maps, including SRHT and Gaussian matrices, on distributed architectures with not too many cores compared to the dimension. We prove that a block SRHT with enough rows is an oblivious subspace embedding, i.e., an approximate isometry for an arbitrary low-dimensional subspace with high probability. Our estimate of the required number of rows is similar to that of the standard SRHT. This suggests that the two transforms should provide the same accuracy of approximation in the algorithms. The block SRHT can be readily incorporated into randomized methods, for instance to compute a low-rank approximation of a large-scale matrix. For completeness, we revisit some common randomized approaches for this problem such as Randomized Singular Value Decomposition and Nyström approximation, with a discussion of their accuracy and implementation on distributed architectures.
△ Less
Submitted 20 October, 2022;
originally announced October 2022.
-
A Directional Equispaced interpolation-based Fast Multipole Method for oscillatory kernels
Authors:
Igor Chollet,
Xavier Claeys,
Pierre Fortin,
Laura Grigori
Abstract:
Fast Multipole Methods (FMMs) based on the oscillatory Helmholtz kernel can reduce the cost of solving N-body problems arising from Boundary Integral Equations (BIEs) in acoustic or electromagnetics. However, their cost strongly increases in the high-frequency regime. This paper introduces a new directional FMM for oscillatory kernels (defmm - directional equispaced interpolation-based fmm), whose…
▽ More
Fast Multipole Methods (FMMs) based on the oscillatory Helmholtz kernel can reduce the cost of solving N-body problems arising from Boundary Integral Equations (BIEs) in acoustic or electromagnetics. However, their cost strongly increases in the high-frequency regime. This paper introduces a new directional FMM for oscillatory kernels (defmm - directional equispaced interpolation-based fmm), whose precomputation and application are FFT-accelerated due to polynomial interpolations on equispaced grids. We demonstrate the consistency of our FFT approach, and show how symmetries can be exploited in the Fourier domain. We also describe the algorithmic design of defmm, well-suited for the BIE non-uniform particle distributions, and present performance optimizations on one CPU core. Finally, we exhibit important performance gains on all test cases for defmm over a state-of-the-art FMM library for oscillatory kernels.
△ Less
Submitted 10 February, 2022;
originally announced February 2022.
-
Randomized block Gram-Schmidt process for solution of linear systems and eigenvalue problems
Authors:
Oleg Balabanov,
Laura Grigori
Abstract:
This article introduces randomized block Gram-Schmidt process (RBGS) for QR decomposition. RBGS extends the single-vector randomized Gram-Schmidt (RGS) algorithm and inherits its key characteristics such as being more efficient and having at least as much stability as any deterministic (block) Gram-Schmidt algorithm.
Block algorithms offer superior performance as they are based on BLAS3 matrix-w…
▽ More
This article introduces randomized block Gram-Schmidt process (RBGS) for QR decomposition. RBGS extends the single-vector randomized Gram-Schmidt (RGS) algorithm and inherits its key characteristics such as being more efficient and having at least as much stability as any deterministic (block) Gram-Schmidt algorithm.
Block algorithms offer superior performance as they are based on BLAS3 matrix-wise operations and reduce communication cost when executed in parallel. Notably, our low-synchronization variant of RBGS can be implemented in a parallel environment using only one global reduction operation between processors per block. Moreover, the block Gram-Schmidt orthogonalization is the key element in the block Arnoldi procedure for the construction of a Krylov basis, which in turn is used in GMRES, FOM and Rayleigh-Ritz methods for the solution of linear systems and clustered eigenvalue problems. In this article, we develop randomized versions of these methods, based on RBGS, and validate them on nontrivial numerical examples.
△ Less
Submitted 27 May, 2023; v1 submitted 29 November, 2021;
originally announced November 2021.
-
Randomized Gram-Schmidt process with application to GMRES
Authors:
Oleg Balabanov,
Laura Grigori
Abstract:
A randomized Gram-Schmidt algorithm is developed for orthonormalization of high-dimensional vectors or QR factorization. The proposed process can be less computationally expensive than the classical Gram-Schmidt process while being at least as numerically stable as the modified Gram-Schmidt process. Our approach is based on random sketching, which is a dimension reduction technique consisting in e…
▽ More
A randomized Gram-Schmidt algorithm is developed for orthonormalization of high-dimensional vectors or QR factorization. The proposed process can be less computationally expensive than the classical Gram-Schmidt process while being at least as numerically stable as the modified Gram-Schmidt process. Our approach is based on random sketching, which is a dimension reduction technique consisting in estimation of inner products of high-dimensional vectors by inner products of their small efficiently-computable random images, so-called sketches. In this way, an approximate orthogonality of the full vectors can be obtained by orthogonalization of their sketches. The proposed Gram-Schmidt algorithm can provide computational cost reduction in any architecture. The benefit of random sketching can be amplified by performing the non-dominant operations in higher precision. In this case the numerical stability can be guaranteed with a working unit roundoff independent of the dimension of the problem. The proposed Gram-Schmidt process can be applied to Arnoldi iteration and result in new Krylov subspace methods for solving high-dimensional systems of equations or eigenvalue problems. Among them we chose randomized GMRES method as a practical application of the methodology.
△ Less
Submitted 18 January, 2022; v1 submitted 10 November, 2020;
originally announced November 2020.
-
Accelerating linear system solvers for time domain component separation of cosmic microwave background data
Authors:
J. Papež,
L. Grigori,
R. Stompor
Abstract:
Component separation is one of the key stages of any modern, cosmic microwave background (CMB) data analysis pipeline. It is an inherently non-linear procedure and typically involves a series of sequential solutions of linear systems with similar, albeit not identical system matrices, derived for different data models of the same data set. Sequences of this kind arise for instance in the maximizat…
▽ More
Component separation is one of the key stages of any modern, cosmic microwave background (CMB) data analysis pipeline. It is an inherently non-linear procedure and typically involves a series of sequential solutions of linear systems with similar, albeit not identical system matrices, derived for different data models of the same data set. Sequences of this kind arise for instance in the maximization of the data likelihood with respect to foreground parameters or sampling of their posterior distribution. However, they are also common in many other contexts. In this work we consider solving the component separation problem directly in the measurement (time) domain, which can have a number of important advantageous over the more standard pixel-based methods, in particular if non-negligible time-domain noise correlations are present as it is commonly the case. The time-domain based approach implies, however, significant computational effort due to the need to manipulate the full volume of time-domain data set. To address this challenge, we propose and study efficient solvers adapted to solving time-domain-based, component separation systems and their sequences and which are capable of capitalizing on information derived from the previous solutions. This is achieved either via adapting the initial guess of the subsequent system or through a so-called subspace recycling, which allows to construct progressively more efficient, two-level preconditioners. We report an overall speed-up over solving the systems independently of a factor of nearly 7, or 5, in the worked examples inspired respectively by the likelihood maximization and likelihood sampling procedures we consider in this work.
△ Less
Submitted 1 June, 2020; v1 submitted 7 February, 2020;
originally announced February 2020.
-
An improved analysis and unified perspective on deterministic and randomized low rank matrix approximations
Authors:
James Demmel,
Laura Grigori,
Alexander Rusciano
Abstract:
We introduce a Generalized LU-Factorization (\textbf{GLU}) for low-rank matrix approximation. We relate this to past approaches and extensively analyze its approximation properties. The established deterministic guarantees are combined with sketching ensembles satisfying Johnson-Lindenstrauss properties to present complete bounds. Particularly good performance is shown for the sub-sampled randomiz…
▽ More
We introduce a Generalized LU-Factorization (\textbf{GLU}) for low-rank matrix approximation. We relate this to past approaches and extensively analyze its approximation properties. The established deterministic guarantees are combined with sketching ensembles satisfying Johnson-Lindenstrauss properties to present complete bounds. Particularly good performance is shown for the sub-sampled randomized Hadamard transform (SRHT) ensemble. Moreover, the factorization is shown to unify and generalize many past algorithms. It also helps to explain the effect of sketching on the growth factor during Gaussian Elimination.
△ Less
Submitted 1 October, 2019;
originally announced October 2019.
-
Solving linear equations with messenger-field and conjugate gradients techniques - an application to CMB data analysis
Authors:
J. Papez,
L. Grigori,
R. Stompor
Abstract:
We discuss linear system solvers invoking a messenger-field and compare them with (preconditioned) conjugate gradients approaches. We show that the messenger-field techniques correspond to fixed point iterations of an appropriately preconditioned initial system of linear equations. We then argue that a conjugate gradient solver applied to the same preconditioned system, or equivalently a precondit…
▽ More
We discuss linear system solvers invoking a messenger-field and compare them with (preconditioned) conjugate gradients approaches. We show that the messenger-field techniques correspond to fixed point iterations of an appropriately preconditioned initial system of linear equations. We then argue that a conjugate gradient solver applied to the same preconditioned system, or equivalently a preconditioned conjugate gradient solver using the same preconditioner and applied to the original system, will in general ensure at least a comparable and typically better performance in terms of the number of iterations to convergence and time-to-solution. We illustrate our conclusions on two common examples drawn from the Cosmic Microwave Background data analysis: Wiener filtering and map-making. In addition, and contrary to the standard lore in the CMB field, we show that the performance of the preconditioned conjugate gradient solver can depend importantly on the starting vector. This observation seems of particular importance in the cases of map-making of high signal-to-noise sky maps and therefore should be of relevance for the next generation of CMB experiments.
△ Less
Submitted 22 October, 2018; v1 submitted 9 March, 2018;
originally announced March 2018.
-
URV Factorization with Random Orthogonal System Mixing
Authors:
Stephen Becker,
James Folberth,
Laura Grigori
Abstract:
The unpivoted and pivoted Householder QR factorizations are ubiquitous in numerical linear algebra. A difficulty with pivoted Householder QR is the communication bottleneck introduced by pivoting. In this paper we propose using random orthogonal systems to quickly mix together the columns of a matrix before computing an unpivoted QR factorization. This method computes a URV factorization which for…
▽ More
The unpivoted and pivoted Householder QR factorizations are ubiquitous in numerical linear algebra. A difficulty with pivoted Householder QR is the communication bottleneck introduced by pivoting. In this paper we propose using random orthogonal systems to quickly mix together the columns of a matrix before computing an unpivoted QR factorization. This method computes a URV factorization which forgoes expensive pivoted QR steps in exchange for mixing in advance, followed by a cheaper, unpivoted QR factorization. The mixing step typically reduces the variability of the column norms, and in certain experiments, allows us to compute an accurate factorization where a plain, unpivoted QR performs poorly. We experiment with linear least-squares, rank-revealing factorizations, and the QLP approximation, and conclude that our randomized URV factorization behaves comparably to a similar randomized rank-revealing URV factorization, but at a fraction of the computational cost. Our experiments provide evidence that our proposed factorization might be rank-revealing with high probability.
△ Less
Submitted 7 March, 2017;
originally announced March 2017.
-
Exploiting Multiple Levels of Parallelism in Sparse Matrix-Matrix Multiplication
Authors:
Ariful Azad,
Grey Ballard,
Aydin Buluc,
James Demmel,
Laura Grigori,
Oded Schwartz,
Sivan Toledo,
Samuel Williams
Abstract:
Sparse matrix-matrix multiplication (or SpGEMM) is a key primitive for many high-performance graph algorithms as well as for some linear solvers, such as algebraic multigrid. The scaling of existing parallel implementations of SpGEMM is heavily bound by communication. Even though 3D (or 2.5D) algorithms have been proposed and theoretically analyzed in the flat MPI model on Erdos-Renyi matrices, th…
▽ More
Sparse matrix-matrix multiplication (or SpGEMM) is a key primitive for many high-performance graph algorithms as well as for some linear solvers, such as algebraic multigrid. The scaling of existing parallel implementations of SpGEMM is heavily bound by communication. Even though 3D (or 2.5D) algorithms have been proposed and theoretically analyzed in the flat MPI model on Erdos-Renyi matrices, those algorithms had not been implemented in practice and their complexities had not been analyzed for the general case. In this work, we present the first ever implementation of the 3D SpGEMM formulation that also exploits multiple (intra-node and inter-node) levels of parallelism, achieving significant speedups over the state-of-the-art publicly available codes at all levels of concurrencies. We extensively evaluate our implementation and identify bottlenecks that should be subject to further research.
△ Less
Submitted 16 November, 2016; v1 submitted 3 October, 2015;
originally announced October 2015.
-
LU factorization with panel rank revealing pivoting and its communication avoiding version
Authors:
Amal Khabou,
James W. Demmel,
Laura Grigori,
Ming Gu
Abstract:
We present the LU decomposition with panel rank revealing pivoting (LU_PRRP), an LU factorization algorithm based on strong rank revealing QR panel factorization. LU_PRRP is more stable than Gaussian elimination with partial pivoting (GEPP). Our extensive numerical experiments show that the new factorization scheme is as numerically stable as GEPP in practice, but it is more resistant to pathologi…
▽ More
We present the LU decomposition with panel rank revealing pivoting (LU_PRRP), an LU factorization algorithm based on strong rank revealing QR panel factorization. LU_PRRP is more stable than Gaussian elimination with partial pivoting (GEPP). Our extensive numerical experiments show that the new factorization scheme is as numerically stable as GEPP in practice, but it is more resistant to pathological cases and easily solves the Wilkinson matrix and the Foster matrix. We also present CALU_PRRP, a communication avoiding version of LU_PRRP that minimizes communication. CALU_PRRP is based on tournament pivoting, with the selection of the pivots at each step of the tournament being performed via strong rank revealing QR factorization. CALU_PRRP is more stable than CALU, the communication avoiding version of GEPP. CALU_PRRP is also more stable in practice and is resistant to pathological cases on which GEPP and CALU fail.
△ Less
Submitted 12 August, 2012;
originally announced August 2012.
-
Solution of the optimal assignment problem by diagonal scaling algorithms
Authors:
Meisam Sharify,
Stéphane Gaubert,
Laura Grigori
Abstract:
We show that a solution of the optimal assignment problem can be obtained as the limit of the solution of an entropy maximization problem, as a deformation parameter tends to infinity. This allows us to apply entropy maximization algorithms to the optimal assignment problem. In particular, the Sinkhorn algorithm leads to a parallelizable method, which can be used as a preprocessing to handle large…
▽ More
We show that a solution of the optimal assignment problem can be obtained as the limit of the solution of an entropy maximization problem, as a deformation parameter tends to infinity. This allows us to apply entropy maximization algorithms to the optimal assignment problem. In particular, the Sinkhorn algorithm leads to a parallelizable method, which can be used as a preprocessing to handle large dense optimal assignment problems. This parallel preprocessing allows one to delete entries which do not belong to optimal permutations, leading to a reduced instance which becomes solvable with limited memory requirements.
△ Less
Submitted 4 November, 2013; v1 submitted 19 April, 2011;
originally announced April 2011.
-
Generalized Filtering Decomposition
Authors:
Laura Grigori,
Frédéric Nataf
Abstract:
This paper introduces a new preconditioning technique that is suitable for matrices arising from the discretization of a system of PDEs on unstructured grids. The preconditioner satisfies a so-called filtering property, which ensures that the input matrix is identical with the preconditioner on a given filtering vector. This vector is chosen to alleviate the effect of low frequency modes on conver…
▽ More
This paper introduces a new preconditioning technique that is suitable for matrices arising from the discretization of a system of PDEs on unstructured grids. The preconditioner satisfies a so-called filtering property, which ensures that the input matrix is identical with the preconditioner on a given filtering vector. This vector is chosen to alleviate the effect of low frequency modes on convergence and so decrease or eliminate the plateau which is often observed in the convergence of iterative methods. In particular, the paper presents a general approach that allows to ensure that the filtering condition is satisfied in a matrix decomposition. The input matrix can have an arbitrary sparse structure. Hence, it can be reordered using nested dissection, to allow a parallel computation of the preconditioner and of the iterative process.
△ Less
Submitted 15 March, 2011;
originally announced March 2011.
-
Implementing Communication-Optimal Parallel and Sequential QR Factorizations
Authors:
James Demmel,
Laura Grigori,
Mark Hoemmen,
Julien Langou
Abstract:
We present parallel and sequential dense QR factorization algorithms for tall and skinny matrices and general rectangular matrices that both minimize communication, and are as stable as Householder QR. The sequential and parallel algorithms for tall and skinny matrices lead to significant speedups in practice over some of the existing algorithms, including LAPACK and ScaLAPACK, for example up to…
▽ More
We present parallel and sequential dense QR factorization algorithms for tall and skinny matrices and general rectangular matrices that both minimize communication, and are as stable as Householder QR. The sequential and parallel algorithms for tall and skinny matrices lead to significant speedups in practice over some of the existing algorithms, including LAPACK and ScaLAPACK, for example up to 6.7x over ScaLAPACK. The parallel algorithm for general rectangular matrices is estimated to show significant speedups over ScaLAPACK, up to 22x over ScaLAPACK.
△ Less
Submitted 14 September, 2008;
originally announced September 2008.
-
Communication-optimal parallel and sequential QR and LU factorizations
Authors:
James Demmel,
Laura Grigori,
Mark Hoemmen,
Julien Langou
Abstract:
We present parallel and sequential dense QR factorization algorithms that are both optimal (up to polylogarithmic factors) in the amount of communication they perform, and just as stable as Householder QR.
We prove optimality by extending known lower bounds on communication bandwidth for sequential and parallel matrix multiplication to provide latency lower bounds, and show these bounds apply…
▽ More
We present parallel and sequential dense QR factorization algorithms that are both optimal (up to polylogarithmic factors) in the amount of communication they perform, and just as stable as Householder QR.
We prove optimality by extending known lower bounds on communication bandwidth for sequential and parallel matrix multiplication to provide latency lower bounds, and show these bounds apply to the LU and QR decompositions. We not only show that our QR algorithms attain these lower bounds (up to polylogarithmic factors), but that existing LAPACK and ScaLAPACK algorithms perform asymptotically more communication. We also point out recent LU algorithms in the literature that attain at least some of these lower bounds.
△ Less
Submitted 19 August, 2008;
originally announced August 2008.
-
Communication-optimal parallel and sequential QR and LU factorizations: theory and practice
Authors:
James Demmel,
Laura Grigori,
Mark Hoemmen,
Julien Langou
Abstract:
We present parallel and sequential dense QR factorization algorithms that are both optimal (up to polylogarithmic factors) in the amount of communication they perform, and just as stable as Householder QR. Our first algorithm, Tall Skinny QR (TSQR), factors m-by-n matrices in a one-dimensional (1-D) block cyclic row layout, and is optimized for m >> n. Our second algorithm, CAQR (Communication-A…
▽ More
We present parallel and sequential dense QR factorization algorithms that are both optimal (up to polylogarithmic factors) in the amount of communication they perform, and just as stable as Householder QR. Our first algorithm, Tall Skinny QR (TSQR), factors m-by-n matrices in a one-dimensional (1-D) block cyclic row layout, and is optimized for m >> n. Our second algorithm, CAQR (Communication-Avoiding QR), factors general rectangular matrices distributed in a two-dimensional block cyclic layout. It invokes TSQR for each block column factorization.
△ Less
Submitted 29 August, 2008; v1 submitted 12 June, 2008;
originally announced June 2008.