-
Vectorised Parallel in Time methods for low-order discretizations with application to Porous Media problems
Authors:
Christian Engwer,
Alexander Schell,
Nils-Arne Dreier
Abstract:
High order methods have shown great potential to overcome performance issues of simulations of partial differential equations (PDEs) on modern hardware, still many users stick to low-order, matrixbased simulations, in particular in porous media applications. Heterogeneous coefficients and low regularity of the solution are reasons not to employ high order discretizations. We present a new approach…
▽ More
High order methods have shown great potential to overcome performance issues of simulations of partial differential equations (PDEs) on modern hardware, still many users stick to low-order, matrixbased simulations, in particular in porous media applications. Heterogeneous coefficients and low regularity of the solution are reasons not to employ high order discretizations. We present a new approach for the simulation of instationary PDEs that allows to partially mitigate the performance problems. By reformulating the original problem we derive a parallel in time time integrator that increases the arithmetic intensity and introduces additional structure into the problem. By this it helps accelerate matrix-based simulations on modern hardware architectures. Based on a system for multiple time steps we will formulate a matrix equation that can be solved using vectorised solvers like Block Krylov methods. The structure of this approach makes it applicable for a wide range of linear and nonlinear problems. In our numerical experiments we present some first results for three different PDEs, a linear convection-diffusion equation, a nonlinear diffusion-reaction equation and a realistic example based on the Richards' equation.
△ Less
Submitted 2 April, 2025;
originally announced April 2025.
-
A Hardware-aware and Stable Orthogonalization Framework
Authors:
Nils-Arne Dreier,
Christian Engwer
Abstract:
The orthogonalization process is an essential building block in Krylov space methods, which takes up a large portion of the computational time. Commonly used methods, like the Gram-Schmidt method, consider the projection and normalization separately and store the orthogonal base explicitly. We consider the problem of orthogonalization and normalization as a QR decomposition problem on which we app…
▽ More
The orthogonalization process is an essential building block in Krylov space methods, which takes up a large portion of the computational time. Commonly used methods, like the Gram-Schmidt method, consider the projection and normalization separately and store the orthogonal base explicitly. We consider the problem of orthogonalization and normalization as a QR decomposition problem on which we apply known algorithms, namely CholeskyQR and TSQR. This leads to methods that solve the orthogonlization problem with reduced communication costs, while maintaining stability and stores the orthogonal base in a locally orthogonal representation. Furthermore, we discuss the novel method as a framework which allows us to combine different orthogonalization algorithms and use the best algorithm for each part of the hardware. After the formulation of the methods, we show their advantageous performance properties based on a performance model that takes data transfers within compute nodes as well as message passing between compute nodes into account. The theoretic results are validated by numerical experiments.
△ Less
Submitted 28 April, 2022;
originally announced April 2022.
-
Hardware-Oriented Krylov Methods for High-Performance Computing
Authors:
Nils-Arne Dreier
Abstract:
Krylov subspace methods are an essential building block in numerical simulation software. The efficient utilization of modern hardware is a challenging problem in the development of these methods. In this work, we develop Krylov subspace methods to solve linear systems with multiple right-hand sides, tailored to modern hardware in high-performance computing. To this end, we analyze an innovative b…
▽ More
Krylov subspace methods are an essential building block in numerical simulation software. The efficient utilization of modern hardware is a challenging problem in the development of these methods. In this work, we develop Krylov subspace methods to solve linear systems with multiple right-hand sides, tailored to modern hardware in high-performance computing. To this end, we analyze an innovative block Krylov subspace framework that allows to balance the computational and data-transfer costs to the hardware. Based on the framework, we formulate commonly used Krylov methods. For the CG and BiCGStab methods, we introduce a novel stabilization approach as an alternative to a deflation strategy. This helps us to retain the block size, thus leading to a simpler and more efficient implementation. In addition, we optimize the methods further for distributed memory systems and the communication overhead. For the CG method, we analyze approaches to overlap the communication and computation and present multiple variants of the CG method, which differ in their communication properties. Furthermore, we present optimizations of the orthogonalization procedure in the GMRes method. Beside introducing a pipelined Gram-Schmidt variant that overlaps the global communication with the computation of inner products, we present a novel orthonormalization method based on the TSQR algorithm, which is communication-optimal and stable. For all optimized method, we present tests that show their superiority in a distributed setting.
△ Less
Submitted 6 April, 2021;
originally announced April 2021.
-
Strategies for the vectorized Block Conjugate Gradients method
Authors:
Nils-Arne Dreier,
Christian Engwer
Abstract:
Block Krylov methods have recently gained a lot of attraction. Due to their increased arithmetic intensity they offer a promising way to improve performance on modern hardware. Recently Frommer et al. presented a block Krylov framework that combines the advantages of block Krylov methods and data parallel methods. We review this framework and apply it on the Block Conjugate Gradients method,to sol…
▽ More
Block Krylov methods have recently gained a lot of attraction. Due to their increased arithmetic intensity they offer a promising way to improve performance on modern hardware. Recently Frommer et al. presented a block Krylov framework that combines the advantages of block Krylov methods and data parallel methods. We review this framework and apply it on the Block Conjugate Gradients method,to solve linear systems with multiple right hand sides. In this course we consider challenges that occur on modern hardware, like a limited memory bandwidth, the use of SIMD instructions and the communication overhead. We present a performance model to predict the efficiency of different Block CG variants and compare these with experimental numerical results.
△ Less
Submitted 26 December, 2019;
originally announced December 2019.
-
Exa-Dune -- Flexible PDE Solvers, Numerical Methods and Applications
Authors:
Peter Bastian,
Mirco Altenbernd,
Nils-Arne Dreier,
Christian Engwer,
Jorrit Fahlke,
René Fritze,
Markus Geveler,
Dominik Göddeke,
Oleg Iliev,
Olaf Ippisch,
Jan Mohring,
Steffen Müthing,
Mario Ohlberger,
Dirk Ribbrock,
Nikolay Shegunov,
Stefan Turek
Abstract:
In the Exa-Dune project we have developed, implemented and optimised numerical algorithms and software for the scalable solution of partial differential equations (PDEs) on future exascale systems exhibiting a heterogeneous massively parallel architecture. In order to cope with the increased probability of hardware failures, one aim of the project was to add flexible, application-oriented resilien…
▽ More
In the Exa-Dune project we have developed, implemented and optimised numerical algorithms and software for the scalable solution of partial differential equations (PDEs) on future exascale systems exhibiting a heterogeneous massively parallel architecture. In order to cope with the increased probability of hardware failures, one aim of the project was to add flexible, application-oriented resilience capabilities into the framework. Continuous improvement of the underlying hardware-oriented numerical methods have included GPU-based sparse approximate inverses, matrix-free sum-factorisation for high-order discontinuous Galerkin discretisations as well as partially matrix-free preconditioners. On top of that, additional scalability is facilitated by exploiting massive coarse grained parallelism offered by multiscale and uncertainty quantification methods where we have focused on the adaptive choice of the coarse/fine scale and the overlap region as well as the combination of local reduced basis multiscale methods and the multilevel Monte-Carlo algorithm. Finally, some of the concepts are applied in a land-surface model including subsurface flow and surface runoff.
△ Less
Submitted 6 November, 2019; v1 submitted 4 November, 2019;
originally announced November 2019.