Skip to main content

Showing 1–9 of 9 results for author: Yamazaki, I

Searching in archive cs. Search in all archives.
.
  1. arXiv:2503.16717  [pdf, other

    math.NA cs.DC

    Random-sketching Techniques to Enhance the Numerical Stability of Block Orthogonalization Algorithms for s-step GMRES

    Authors: Ichitaro Yamazaki, Andrew J. Higgins, Erik G. Boman, Daniel B. Szyld

    Abstract: We integrate random sketching techniques into block orthogonalization schemes needed for s-step GMRES. The resulting block orthogonalization schemes generate the basis vectors whose overall orthogonality error is bounded by machine precision as long as each of the corresponding block vectors are numerically full rank. We implement these randomized block orthogonalization schemes using standard dis… ▽ More

    Submitted 28 April, 2025; v1 submitted 20 March, 2025; originally announced March 2025.

    Comments: 16 pages

  2. arXiv:2503.08126  [pdf, other

    cs.MS math.NA

    Trilinos: Enabling Scientific Computing Across Diverse Hardware Architectures at Scale

    Authors: Matthias Mayr, Alexander Heinlein, Christian Glusa, Siva Rajamanickam, Maarten Arnst, Roscoe Bartlett, Luc Berger-Vergiat, Erik Boman, Karen Devine, Graham Harper, Michael Heroux, Mark Hoemmen, Jonathan Hu, Brian Kelley, Kyungjoo Kim, Drew P. Kouri, Paul Kuberry, Kim Liegeois, Curtis C. Ober, Roger Pawlowski, Carl Pearson, Mauro Perego, Eric Phipps, Denis Ridzal, Nathan V. Roberts , et al. (8 additional authors not shown)

    Abstract: Trilinos is a community-developed, open-source software framework that facilitates building large-scale, complex, multiscale, multiphysics simulation code bases for scientific and engineering problems. Since the Trilinos framework has undergone substantial changes to support new applications and new hardware architectures, this document is an update to ``An Overview of the Trilinos project'' by He… ▽ More

    Submitted 11 March, 2025; originally announced March 2025.

    Comments: 32 pages, 1 figure

    Report number: SAND2025-02891O MSC Class: 65-04; 65Y05 ACM Class: G.4; G.1.3

  3. arXiv:2402.15033  [pdf, other

    math.NA cs.DC

    Two-Stage Block Orthogonalization to Improve Performance of $s$-step GMRES

    Authors: Ichitaro Yamazaki, Andrew J. Higgins, Erik G. Boman, Daniel B. Szyld

    Abstract: On current computer architectures, GMRES' performance can be limited by its communication cost to generate orthonormal basis vectors of the Krylov subspace. To address this performance bottleneck, its $s$-step variant orthogonalizes a block of $s$ basis vectors at a time, potentially reducing the communication cost by a factor of $s$. Unfortunately, for a large step size $s$, the solver can genera… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

    Comments: Accepted for publication in IPDPS'24

  4. arXiv:2304.04876  [pdf, other

    math.NA cs.DC cs.MS

    An Experimental Study of Two-Level Schwarz Domain Decomposition Preconditioners on GPUs

    Authors: Ichitaro Yamazaki, Alexander Heinlein, Sivasankaran Rajamanickam

    Abstract: The generalized Dryja--Smith--Widlund (GDSW) preconditioner is a two-level overlapping Schwarz domain decomposition (DD) preconditioner that couples a classical one-level overlapping Schwarz preconditioner with an energy-minimizing coarse space. When used to accelerate the convergence rate of Krylov subspace iterative methods, the GDSW preconditioner provides robustness and scalability for the sol… ▽ More

    Submitted 10 April, 2023; originally announced April 2023.

    Comments: Accepted for publication in IPDPS'23

  5. arXiv:2109.01232  [pdf, other

    cs.DC cs.MS math.NA

    A Study of Mixed Precision Strategies for GMRES on GPUs

    Authors: Jennifer A. Loe, Christian A. Glusa, Ichitaro Yamazaki, Erik G. Boman, Sivasankaran Rajamanickam

    Abstract: Support for lower precision computation is becoming more common in accelerator hardware due to lower power usage, reduced data movement and increased computational performance. However, computational science and engineering (CSE) problems require double precision accuracy in several domains. This conflict between hardware trends and application needs has resulted in a need for mixed precision stra… ▽ More

    Submitted 2 September, 2021; originally announced September 2021.

    Comments: arXiv admin note: substantial text overlap with arXiv:2105.07544

  6. arXiv:2105.07544  [pdf, other

    math.NA cs.MS

    Experimental Evaluation of Multiprecision Strategies for GMRES on GPUs

    Authors: Jennifer A. Loe, Christian A. Glusa, Ichitaro Yamazaki, Erik G. Boman, Sivasankaran Rajamanickam

    Abstract: Support for lower precision computation is becoming more common in accelerator hardware due to lower power usage, reduced data movement and increased computational performance. However, computational science and engineering (CSE) problems require double precision accuracy in several domains. This conflict between hardware trends and application needs has resulted in a need for multiprecision strat… ▽ More

    Submitted 16 May, 2021; originally announced May 2021.

    Comments: Accepted for publication in the IEEE IPDPS Accelerators and Hybrid Emerging Systems (AsHES) 11th Workshop, 2021

  7. arXiv:2104.01196  [pdf, other

    math.NA cs.MS

    Two-Stage Gauss--Seidel Preconditioners and Smoothers for Krylov Solvers on a GPU cluster

    Authors: Luc Berger-Vergiat, Brian Kelley, Sivasankaran Rajamanickam, Jonathan Hu, Katarzyna Swirydowicz, Paul Mullowney, Stephen Thomas, Ichitaro Yamazaki

    Abstract: Gauss-Seidel (GS) relaxation is often employed as a preconditioner for a Krylov solver or as a smoother for Algebraic Multigrid (AMG). However, the requisite sparse triangular solve is difficult to parallelize on many-core architectures such as graphics processing units (GPUs). In the present study, the performance of the traditional GS relaxation based on a triangular solve is compared with two-s… ▽ More

    Submitted 24 April, 2021; v1 submitted 2 April, 2021; originally announced April 2021.

  8. arXiv:2103.11991  [pdf, other

    cs.MS

    Kokkos Kernels: Performance Portable Sparse/Dense Linear Algebra and Graph Kernels

    Authors: Sivasankaran Rajamanickam, Seher Acer, Luc Berger-Vergiat, Vinh Dang, Nathan Ellingwood, Evan Harvey, Brian Kelley, Christian R. Trott, Jeremiah Wilke, Ichitaro Yamazaki

    Abstract: As hardware architectures are evolving in the push towards exascale, developing Computational Science and Engineering (CSE) applications depend on performance portable approaches for sustainable software development. This paper describes one aspect of performance portability with respect to developing a portable library of kernels that serve the needs of several CSE applications and software frame… ▽ More

    Submitted 22 March, 2021; originally announced March 2021.

    Report number: SAND2021-3421 O

  9. arXiv:2007.06674  [pdf, other

    cs.MS math.NA

    A Survey of Numerical Methods Utilizing Mixed Precision Arithmetic

    Authors: Ahmad Abdelfattah, Hartwig Anzt, Erik G. Boman, Erin Carson, Terry Cojean, Jack Dongarra, Mark Gates, Thomas Grützmacher, Nicholas J. Higham, Sherry Li, Neil Lindquist, Yang Liu, Jennifer Loe, Piotr Luszczek, Pratik Nayak, Sri Pranesh, Siva Rajamanickam, Tobias Ribizel, Barry Smith, Kasia Swirydowicz, Stephen Thomas, Stanimire Tomov, Yaohung M. Tsai, Ichitaro Yamazaki, Urike Meier Yang

    Abstract: Within the past years, hardware vendors have started designing low precision special function units in response to the demand of the Machine Learning community and their demand for high compute power in low precision formats. Also the server-line products are increasingly featuring low-precision special function units, such as the NVIDIA tensor cores in ORNL's Summit supercomputer providing more t… ▽ More

    Submitted 13 July, 2020; originally announced July 2020.

    Comments: Technical report as a part of the Exascale computing project (ECP)

    ACM Class: G.1.3; G.4