Search | arXiv e-print repository

arXiv:2412.19322 [pdf, other]

Mixed-precision numerics in scientific applications: survey and perspectives

Authors: Aditya Kashi, Hao Lu, Wesley Brewer, David Rogers, Michael Matheson, Mallikarjun Shankar, Feiyi Wang

Abstract: The explosive demand for artificial intelligence (AI) workloads has led to a significant increase in silicon area dedicated to lower-precision computations on recent high-performance computing hardware designs. However, mixed-precision capabilities, which can achieve performance improvements of 8x compared to double-precision in extreme compute-intensive workloads, remain largely untapped in most… ▽ More The explosive demand for artificial intelligence (AI) workloads has led to a significant increase in silicon area dedicated to lower-precision computations on recent high-performance computing hardware designs. However, mixed-precision capabilities, which can achieve performance improvements of 8x compared to double-precision in extreme compute-intensive workloads, remain largely untapped in most scientific applications. A growing number of efforts have shown that mixed-precision algorithmic innovations can deliver superior performance without sacrificing accuracy. These developments should prompt computational scientists to seriously consider whether their scientific modeling and simulation applications could benefit from the acceleration offered by new hardware and mixed-precision algorithms. In this article, we review the literature on relevant applications, existing mixed-precision algorithms, theories, and the available software infrastructure. We then offer our perspective and recommendations on the potential of mixed-precision algorithms to enhance the performance of scientific simulation applications. Broadly, we find that mixed-precision methods can have a large impact on computational science in terms of time-to-solution and energy consumption. This is true not only for a few arithmetic-dominated applications but also, to a more moderate extent, to the many memory bandwidth-bound applications. In many cases, though, the choice of algorithms and regions of applicability will be domain-specific, and thus require input from domain experts. It is helpful to identify cross-cutting computational motifs and their mixed-precision algorithms in this regard. Finally, there are new algorithms being developed to utilize AI hardware and and AI methods to accelerate first-principles computational science, and these should be closely watched as hardware platforms evolve. △ Less

Submitted 7 January, 2025; v1 submitted 26 December, 2024; originally announced December 2024.

Comments: Submitted to IJHPCA

MSC Class: 65Y10 ACM Class: J.2

arXiv:2406.16740 [pdf, other]

Learning the boundary-to-domain mapping using Lifting Product Fourier Neural Operators for partial differential equations

Authors: Aditya Kashi, Arka Daw, Muralikrishnan Gopalakrishnan Meena, Hao Lu

Abstract: Neural operators such as the Fourier Neural Operator (FNO) have been shown to provide resolution-independent deep learning models that can learn mappings between function spaces. For example, an initial condition can be mapped to the solution of a partial differential equation (PDE) at a future time-step using a neural operator. Despite the popularity of neural operators, their use to predict solu… ▽ More Neural operators such as the Fourier Neural Operator (FNO) have been shown to provide resolution-independent deep learning models that can learn mappings between function spaces. For example, an initial condition can be mapped to the solution of a partial differential equation (PDE) at a future time-step using a neural operator. Despite the popularity of neural operators, their use to predict solution functions over a domain given only data over the boundary (such as a spatially varying Dirichlet boundary condition) remains unexplored. In this paper, we refer to such problems as boundary-to-domain problems; they have a wide range of applications in areas such as fluid mechanics, solid mechanics, heat transfer etc. We present a novel FNO-based architecture, named Lifting Product FNO (or LP-FNO) which can map arbitrary boundary functions defined on the lower-dimensional boundary to a solution in the entire domain. Specifically, two FNOs defined on the lower-dimensional boundary are lifted into the higher dimensional domain using our proposed lifting product layer. We demonstrate the efficacy and resolution independence of the proposed LP-FNO for the 2D Poisson equation. △ Less

Submitted 1 July, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

Comments: Accepted by ICML 2024 AI for Science Workshop

MSC Class: 65N99; 68T07 ACM Class: I.2.1; J.2

arXiv:1912.00539 [pdf, other]

An asynchronous incomplete block LU preconditioner for computational fluid dynamics on unstructured grids

Authors: Aditya Kashi, Siva Nadarajah

Abstract: We present a study of the effectiveness of asynchronous incomplete LU factorization preconditioners for the time-implicit solution of compressible flow problems while exploiting thread-parallelism within a compute node. A block variant of the asynchronous fine-grain parallel preconditioner adapted to a finite volume discretization of the compressible Navier-Stokes equations on unstructured grids i… ▽ More We present a study of the effectiveness of asynchronous incomplete LU factorization preconditioners for the time-implicit solution of compressible flow problems while exploiting thread-parallelism within a compute node. A block variant of the asynchronous fine-grain parallel preconditioner adapted to a finite volume discretization of the compressible Navier-Stokes equations on unstructured grids is presented, and convergence theory is extended to the new variant. Experimental (numerical) results on the performance of these preconditioners on inviscid and viscous laminar two-dimensional steady-state test cases are reported. It is found, for these compressible flow problems, that the block variant performs much better in terms of convergence, parallel scalability and reliability than the original scalar asynchronous ILU preconditioner. For viscous flow, it is found that the ordering of unknowns may determine the success or failure of asynchronous block-ILU preconditioning, and an ordering of grid cells suitable for solving viscous problems is presented. △ Less

Submitted 4 October, 2020; v1 submitted 1 December, 2019; originally announced December 2019.

Comments: Accepted by SIAM SISC

MSC Class: 65F08; 65Y05; 65N22

Showing 1–3 of 3 results for author: Kashi, A