Showing 1–2 of 2 results for author: Kailasa, S

Search v0.5.6 released 2020-02-24

arXiv:2408.07436 [pdf, other]

cs.CE

M2L Translation Operators for Kernel Independent Fast Multipole Methods on Modern Architectures

Authors: Srinath Kailasa, Timo Betcke, Sarah El Kazdadi

Abstract: Hardware trends favor algorithm designs that maximize data reuse per FLOP. We develop and benchmark high-performance Multipole-to-Local (M2L) translation operators for the kernel-independent Fast Multipole Method (kiFMM), a widely adopted FMM variant that supports a broad class of kernels and has been favored by recent implementations for its simple specification. Naively implemented, M2L is bandw… ▽ More Hardware trends favor algorithm designs that maximize data reuse per FLOP. We develop and benchmark high-performance Multipole-to-Local (M2L) translation operators for the kernel-independent Fast Multipole Method (kiFMM), a widely adopted FMM variant that supports a broad class of kernels and has been favored by recent implementations for its simple specification. Naively implemented, M2L is bandwidth-limited and therefore a key bottleneck in the FMM. State-of-the-art FFT-based M2L implementations, though elegant and with a fast setup time, suffer from low operational intensity and require architecture-specific optimizations. We demonstrate that a BLAS-based M2L, combined with randomized low-rank compression, achieves competitive performance with greater portability and a simpler implementation leveraging existing BLAS infrastructure, at the cost of higher setup times-especially for high-accuracy settings in double precision. Our Rust-based implementation enables seamless switching between strategies for fair benchmarking. Results on CPUs show that FFT-based M2L is favorable in low-accuracy settings or dynamic particle simulations, while BLAS-based M2L is favored for high-accuracy settings for static particle distributions, where its higher setup costs are amortized in many practical applications of the FMM. △ Less

Submitted 28 May, 2025; v1 submitted 14 August, 2024; originally announced August 2024.

Comments: 32 pages, 6 figures

MSC Class: 65-04; 65Y05; 65Y10; 65Y15; 65Y20
arXiv:2303.08394 [pdf, other]

cs.SE

doi 10.1109/MCSE.2023.3258288

PyExaFMM: an exercise in designing high-performance software with Python and Numba

Authors: Srinath Kailasa, Tingyu Wang, Lorena A. Barba, Timo Betcke

Abstract: Numba is a game-changing compiler for high-performance computing with Python. It produces machine code that runs outside of the single-threaded Python interpreter and that fully utilizes the resources of modern CPUs. This means support for parallel multithreading and auto vectorization if available, as with compiled languages such as C++ or Fortran. In this article we document our experience devel… ▽ More Numba is a game-changing compiler for high-performance computing with Python. It produces machine code that runs outside of the single-threaded Python interpreter and that fully utilizes the resources of modern CPUs. This means support for parallel multithreading and auto vectorization if available, as with compiled languages such as C++ or Fortran. In this article we document our experience developing PyExaFMM, a multithreaded Numba implementation of the Fast Multipole Method, an algorithm with a non-linear data structure and a large amount of data organization. We find that designing performant Numba code for complex algorithms can be as challenging as writing in a compiled language. △ Less

Submitted 13 April, 2023; v1 submitted 15 March, 2023; originally announced March 2023.

Comments: 10 pages, 3 figures

MSC Class: 68-04 ACM Class: D.2.2

Journal ref: Computing in Science & Engineering, vol. 24, no. 05, pp. 77-84, 2022

Search v0.5.6 released 2020-02-24