Skip to main content

Showing 1–15 of 15 results for author: Grigori, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.15488  [pdf, ps, other

    cs.DC

    Minimizing Communication for Parallel Symmetric Tensor Times Same Vector Computation

    Authors: Hussam Al Daas, Grey Ballard, Laura Grigori, Suraj Kumar, Kathryn Rouse, Mathieu Vérité

    Abstract: In this article, we focus on the parallel communication cost of multiplying the same vector along two modes of a $3$-dimensional symmetric tensor. This is a key computation in the higher-order power method for determining eigenpairs of a $3$-dimensional symmetric tensor and in gradient-based methods for computing a symmetric CP decomposition. We establish communication lower bounds that determine… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

    Comments: 19 pages, 1 figure

  2. arXiv:2409.11304  [pdf, ps, other

    cs.DC

    Communication Lower Bounds and Optimal Algorithms for Symmetric Matrix Computations

    Authors: Hussam Al Daas, Grey Ballard, Laura Grigori, Suraj Kumar, Kathryn Rouse, Mathieu Verite

    Abstract: In this article, we focus on the communication costs of three symmetric matrix computations: i) multiplying a matrix with its transpose, known as a symmetric rank-k update (SYRK) ii) adding the result of the multiplication of a matrix with the transpose of another matrix and the transpose of that result, known as a symmetric rank-2k update (SYR2K) iii) performing matrix multiplication with a symme… ▽ More

    Submitted 17 September, 2024; originally announced September 2024.

    Comments: 43 pages, 6 figures. To be published in ACM Transactions on Parallel Computing

  3. arXiv:2302.11474  [pdf, other

    math.NA cs.MS math.OC

    Randomized Numerical Linear Algebra : A Perspective on the Field With an Eye to Software

    Authors: Riley Murray, James Demmel, Michael W. Mahoney, N. Benjamin Erichson, Maksim Melnichenko, Osman Asif Malik, Laura Grigori, Piotr Luszczek, Michał Dereziński, Miles E. Lopes, Tianyu Liang, Hengrui Luo, Jack Dongarra

    Abstract: Randomized numerical linear algebra - RandNLA, for short - concerns the use of randomization as a resource to develop improved algorithms for large-scale linear algebra computations. The origins of contemporary RandNLA lay in theoretical computer science, where it blossomed from a simple idea: randomization provides an avenue for computing approximate solutions to linear algebra problems more ef… ▽ More

    Submitted 12 April, 2023; v1 submitted 22 February, 2023; originally announced February 2023.

    Comments: v1: this is the first arXiv release of LAPACK Working Note 299. v2: complete rewrite of the subsection on trace estimation, among other changes. See frontmatter page ii (pdf page 5) for revision history

  4. arXiv:2210.11295  [pdf, ps, other

    math.NA cs.DS

    Block subsampled randomized Hadamard transform for low-rank approximation on distributed architectures

    Authors: Oleg Balabanov, Matthias Beaupere, Laura Grigori, Victor Lederer

    Abstract: This article introduces a novel structured random matrix composed blockwise from subsampled randomized Hadamard transforms (SRHTs). The block SRHT is expected to outperform well-known dimension reduction maps, including SRHT and Gaussian matrices, on distributed architectures with not too many cores compared to the dimension. We prove that a block SRHT with enough rows is an oblivious subspace emb… ▽ More

    Submitted 20 October, 2022; originally announced October 2022.

    Journal ref: Proceedings of the International Conference on Machine Learning, pp. 1564-1576. PMLR, 2023

  5. arXiv:2207.10437  [pdf, other

    cs.DC

    Communication Lower Bounds and Optimal Algorithms for Multiple Tensor-Times-Matrix Computation

    Authors: Hussam Al Daas, Grey Ballard, Laura Grigori, Suraj Kumar, Kathryn Rouse

    Abstract: Multiple Tensor-Times-Matrix (Multi-TTM) is a key computation in algorithms for computing and operating with the Tucker tensor decomposition, which is frequently used in multidimensional data analysis. We establish communication lower bounds that determine how much data movement is required to perform the Multi-TTM computation in parallel. The crux of the proof relies on analytically solving a con… ▽ More

    Submitted 2 February, 2023; v1 submitted 21 July, 2022; originally announced July 2022.

  6. arXiv:2205.13407  [pdf, ps, other

    cs.DC

    Tight Memory-Independent Parallel Matrix Multiplication Communication Lower Bounds

    Authors: Hussam Al Daas, Grey Ballard, Laura Grigori, Suraj Kumar, Kathryn Rouse

    Abstract: Communication lower bounds have long been established for matrix multiplication algorithms. However, most methods of asymptotic analysis have either ignored the constant factors or not obtained the tightest possible values. Recent work has demonstrated that more careful analysis improves the best known constants for some classical matrix multiplication lower bounds and helps to identify more effic… ▽ More

    Submitted 26 May, 2022; originally announced May 2022.

  7. arXiv:1910.00223  [pdf, ps, other

    math.NA cs.DS

    An improved analysis and unified perspective on deterministic and randomized low rank matrix approximations

    Authors: James Demmel, Laura Grigori, Alexander Rusciano

    Abstract: We introduce a Generalized LU-Factorization (\textbf{GLU}) for low-rank matrix approximation. We relate this to past approaches and extensively analyze its approximation properties. The established deterministic guarantees are combined with sketching ensembles satisfying Johnson-Lindenstrauss properties to present complete bounds. Particularly good performance is shown for the sub-sampled randomiz… ▽ More

    Submitted 1 October, 2019; originally announced October 2019.

    MSC Class: 15-04

  8. arXiv:1905.11340  [pdf, other

    cs.LG stat.ML

    Parallel and Communication Avoiding Least Angle Regression

    Authors: S. Das, J. Demmel, K. Fountoulakis, L. Grigori, M. W. Mahoney, S. Yang

    Abstract: We are interested in parallelizing the Least Angle Regression (LARS) algorithm for fitting linear regression models to high-dimensional data. We consider two parallel and communication avoiding versions of the basic LARS algorithm. The two algorithms have different asymptotic costs and practical performance. One offers more speedup and the other produces more accurate output. The first is bLARS, a… ▽ More

    Submitted 12 September, 2020; v1 submitted 27 May, 2019; originally announced May 2019.

    Comments: 21 pages, 8 figures, 3 tables

  9. arXiv:1805.05278  [pdf, ps, other

    cs.DC

    A 3D Parallel Algorithm for QR Decomposition

    Authors: Grey Ballard, James Demmel, Laura Grigori, Mathias Jacquelin, Nicholas Knight

    Abstract: Interprocessor communication often dominates the runtime of large matrix computations. We present a parallel algorithm for computing QR decompositions whose bandwidth cost (communication volume) can be decreased at the cost of increasing its latency cost (number of messages). By varying a parameter to navigate the bandwidth/latency tradeoff, we can tune this algorithm for machines with different c… ▽ More

    Submitted 14 May, 2018; originally announced May 2018.

  10. arXiv:1510.00844  [pdf, other

    cs.DC math.NA

    Exploiting Multiple Levels of Parallelism in Sparse Matrix-Matrix Multiplication

    Authors: Ariful Azad, Grey Ballard, Aydin Buluc, James Demmel, Laura Grigori, Oded Schwartz, Sivan Toledo, Samuel Williams

    Abstract: Sparse matrix-matrix multiplication (or SpGEMM) is a key primitive for many high-performance graph algorithms as well as for some linear solvers, such as algebraic multigrid. The scaling of existing parallel implementations of SpGEMM is heavily bound by communication. Even though 3D (or 2.5D) algorithms have been proposed and theoretically analyzed in the flat MPI model on Erdos-Renyi matrices, th… ▽ More

    Submitted 16 November, 2016; v1 submitted 3 October, 2015; originally announced October 2015.

    Journal ref: SIAM Journal of Scientific Computing, Volume 38, Number 6, pp. C624-C651, 2016

  11. arXiv:1408.3048  [pdf, other

    astro-ph.CO cs.DC physics.comp-ph

    Accelerating Cosmic Microwave Background map-making procedure through preconditioning

    Authors: Mikolaj Szydlarski, Laura Grigori, Radek Stompor

    Abstract: Estimation of the sky signal from sequences of time ordered data is one of the key steps in Cosmic Microwave Background (CMB) data analysis, commonly referred to as the map-making problem. Some of the most popular and general methods proposed for this problem involve solving generalised least squares (GLS) equations with non-diagonal noise weights given by a block-diagonal matrix with Toeplitz blo… ▽ More

    Submitted 15 December, 2014; v1 submitted 13 August, 2014; originally announced August 2014.

    Comments: 19 pages // Final version submitted to A&A

    ACM Class: G.4; I.6; J.2

    Journal ref: A&A 572, A39 (2014)

  12. arXiv:1303.5837  [pdf, other

    cs.DC

    Multilevel communication optimal LU and QR factorizations for hierarchical platforms

    Authors: Laura Grigori, Mathias Jacquelin, Amal Khabou

    Abstract: This study focuses on the performance of two classical dense linear algebra algorithms, the LU and the QR factorizations, on multilevel hierarchical platforms. We first introduce a new model called Hierarchical Cluster Platform (HCP), encapsulating the characteristics of such platforms. The focus is set on reducing the communication requirements of studied algorithms at each level of the hierarchy… ▽ More

    Submitted 23 March, 2013; originally announced March 2013.

    Report number: RR-8270

  13. arXiv:1110.2677  [pdf, other

    cs.DC

    Hybrid static/dynamic scheduling for already optimized dense matrix factorization

    Authors: Simplice Donfack, Laura Grigori, William D. Gropp, Vivek Kale

    Abstract: We present the use of a hybrid static/dynamic scheduling strategy of the task dependency graph for direct methods used in dense numerical linear algebra. This strategy provides a balance of data locality, load balance, and low dequeue overhead. We show that the usage of this scheduling in communication avoiding dense factorization leads to significant performance gains. On a 48 core AMD Opteron NU… ▽ More

    Submitted 12 October, 2011; originally announced October 2011.

  14. arXiv:1106.0159  [pdf, other

    cs.DC astro-ph.CO physics.ao-ph physics.comp-ph physics.geo-ph

    Parallel Spherical Harmonic Transforms on heterogeneous architectures (GPUs/multi-core CPUs)

    Authors: Mikolaj Szydlarski, Pierre Esterie, Joel Falcou, Laura Grigori, R. Stompor

    Abstract: Spherical Harmonic Transforms (SHT) are at the heart of many scientific and practical applications ranging from climate modelling to cosmological observations. In many of these areas new, cutting-edge science goals have been recently proposed requiring simulations and analyses of experimental or observational data at very high resolutions and of unprecedented volumes. Both these aspects pose formi… ▽ More

    Submitted 1 April, 2013; v1 submitted 1 June, 2011; originally announced June 2011.

  15. Spherical harmonic transform with GPUs

    Authors: Ioan O. Hupca, Joel Falcou, Laura Grigori, Radek Stompor

    Abstract: We describe an algorithm for computing an inverse spherical harmonic transform suitable for graphic processing units (GPU). We use CUDA and base our implementation on a Fortran90 routine included in a publicly available parallel package, S2HAT. We focus our attention on the two major sequential steps involved in the transforms computation, retaining the efficient parallel framework of the original… ▽ More

    Submitted 6 October, 2010; originally announced October 2010.

    Report number: INRIA technical report 7409

    Journal ref: Proceedings of Euro-Par 2011, Lecture Notes in Computer Science, 2012, Vol. 7155/2012, p. 355