Search | arXiv e-print repository

Trilinos: Enabling Scientific Computing Across Diverse Hardware Architectures at Scale

Authors: Matthias Mayr, Alexander Heinlein, Christian Glusa, Siva Rajamanickam, Maarten Arnst, Roscoe Bartlett, Luc Berger-Vergiat, Erik Boman, Karen Devine, Graham Harper, Michael Heroux, Mark Hoemmen, Jonathan Hu, Brian Kelley, Kyungjoo Kim, Drew P. Kouri, Paul Kuberry, Kim Liegeois, Curtis C. Ober, Roger Pawlowski, Carl Pearson, Mauro Perego, Eric Phipps, Denis Ridzal, Nathan V. Roberts , et al. (8 additional authors not shown)

Abstract: Trilinos is a community-developed, open-source software framework that facilitates building large-scale, complex, multiscale, multiphysics simulation code bases for scientific and engineering problems. Since the Trilinos framework has undergone substantial changes to support new applications and new hardware architectures, this document is an update to ``An Overview of the Trilinos project'' by He… ▽ More Trilinos is a community-developed, open-source software framework that facilitates building large-scale, complex, multiscale, multiphysics simulation code bases for scientific and engineering problems. Since the Trilinos framework has undergone substantial changes to support new applications and new hardware architectures, this document is an update to ``An Overview of the Trilinos project'' by Heroux et al. (ACM Transactions on Mathematical Software, 31(3):397-423, 2005). It describes the design of Trilinos, introduces its new organization in product areas, and highlights established and new features available in Trilinos. Particular focus is put on the modernized software stack based on the Kokkos ecosystem to deliver performance portability across heterogeneous hardware architectures. This paper also outlines the organization of the Trilinos community and the contribution model to help onboard interested users and contributors. △ Less

Submitted 11 March, 2025; originally announced March 2025.

Comments: 32 pages, 1 figure

Report number: SAND2025-02891O MSC Class: 65-04; 65Y05 ACM Class: G.4; G.1.3

arXiv:2104.01196 [pdf, other]

Two-Stage Gauss--Seidel Preconditioners and Smoothers for Krylov Solvers on a GPU cluster

Authors: Luc Berger-Vergiat, Brian Kelley, Sivasankaran Rajamanickam, Jonathan Hu, Katarzyna Swirydowicz, Paul Mullowney, Stephen Thomas, Ichitaro Yamazaki

Abstract: Gauss-Seidel (GS) relaxation is often employed as a preconditioner for a Krylov solver or as a smoother for Algebraic Multigrid (AMG). However, the requisite sparse triangular solve is difficult to parallelize on many-core architectures such as graphics processing units (GPUs). In the present study, the performance of the traditional GS relaxation based on a triangular solve is compared with two-s… ▽ More Gauss-Seidel (GS) relaxation is often employed as a preconditioner for a Krylov solver or as a smoother for Algebraic Multigrid (AMG). However, the requisite sparse triangular solve is difficult to parallelize on many-core architectures such as graphics processing units (GPUs). In the present study, the performance of the traditional GS relaxation based on a triangular solve is compared with two-stage variants, replacing the direct triangular solve with a fixed number of inner Jacobi-Richardson (JR) iterations. When a small number of inner iterations is sufficient to maintain the Krylov convergence rate, the two-stage GS (GS2) often outperforms the traditional algorithm on many-core architectures. We also compare GS2 with JR. When they perform the same number of flops for SpMV (e.g. three JR sweeps compared to two GS sweeps with one inner JR sweep), the GS2 iterations, and the Krylov solver preconditioned with GS2, may converge faster than the JR iterations. Moreover, for some problems (e.g. elasticity), it was found that JR may diverge with a damping factor of one, whereas two-stage GS may improve the convergence with more inner iterations. Finally, to study the performance of the two-stage smoother and preconditioner for a practical problem, %(e.g. using tuned damping factors), these were applied to incompressible fluid flow simulations on GPUs. △ Less

Submitted 24 April, 2021; v1 submitted 2 April, 2021; originally announced April 2021.

arXiv:2103.11991 [pdf, other]

Kokkos Kernels: Performance Portable Sparse/Dense Linear Algebra and Graph Kernels

Authors: Sivasankaran Rajamanickam, Seher Acer, Luc Berger-Vergiat, Vinh Dang, Nathan Ellingwood, Evan Harvey, Brian Kelley, Christian R. Trott, Jeremiah Wilke, Ichitaro Yamazaki

Abstract: As hardware architectures are evolving in the push towards exascale, developing Computational Science and Engineering (CSE) applications depend on performance portable approaches for sustainable software development. This paper describes one aspect of performance portability with respect to developing a portable library of kernels that serve the needs of several CSE applications and software frame… ▽ More As hardware architectures are evolving in the push towards exascale, developing Computational Science and Engineering (CSE) applications depend on performance portable approaches for sustainable software development. This paper describes one aspect of performance portability with respect to developing a portable library of kernels that serve the needs of several CSE applications and software frameworks. We describe Kokkos Kernels, a library of kernels for sparse linear algebra, dense linear algebra and graph kernels. We describe the design principles of such a library and demonstrate portable performance of the library using some selected kernels. Specifically, we demonstrate the performance of four sparse kernels, three dense batched kernels, two graph kernels and one team level algorithm. △ Less

Submitted 22 March, 2021; originally announced March 2021.

Report number: SAND2021-3421 O

arXiv:2103.11962 [pdf, other]

doi 10.1137/20M1375413

Non-invasive multigrid for semi-structured grids

Authors: Matthias Mayr, Luc Berger-Vergiat, Peter Ohm, Raymond S. Tuminaro

Abstract: Multigrid solvers for hierarchical hybrid grids (HHG) have been proposed to promote the efficient utilization of high performance computer architectures. These HHG meshes are constructed by uniformly refining a relatively coarse fully unstructured mesh. While HHG meshes provide some flexibility for unstructured applications, most multigrid calculations can be accomplished using efficient structure… ▽ More Multigrid solvers for hierarchical hybrid grids (HHG) have been proposed to promote the efficient utilization of high performance computer architectures. These HHG meshes are constructed by uniformly refining a relatively coarse fully unstructured mesh. While HHG meshes provide some flexibility for unstructured applications, most multigrid calculations can be accomplished using efficient structured grid ideas and kernels. This paper focuses on generalizing the HHG idea so that it is applicable to a broader community of computational scientists, and so that it is easier for existing applications to leverage structured multigrid components. Specifically, we adapt the structured multigrid methodology to significantly more complex semi-structured meshes. Further, we illustrate how mature applications might adopt a semi-structured solver in a relatively non-invasive fashion. To do this, we propose a formal mathematical framework for describing the semi-structured solver. This formalism allows us to precisely define the associated multigrid method and to show its relationship to a more traditional multigrid solver. Additionally, the mathematical framework clarifies the associated software design and implementation. Numerical experiments highlight the relationship of the new solver with classical multigrid. We also demonstrate the generality and potential performance gains associated with this type of semi-structured multigrid. △ Less

Submitted 22 March, 2021; originally announced March 2021.

Report number: SAND2021-3211 O

Showing 1–4 of 4 results for author: Berger-Vergiat, L