Search | arXiv e-print repository

arXiv:2407.00485 [pdf, ps, other]

Error Analysis and Parallel Scaling Study of A Parareal Parallel-in-Time Integration Algorithm for Particle-in-Fourier Schemes

Authors: Sriramkrishnan Muralikrishnan, Robert Speck

Abstract: We propose a parareal based time parallelization scheme in the phase-space for the particle-in-Fourier (PIF) discretization of the Vlasov-Poisson system used in kinetic plasma simulations. We use PIF with a coarse tolerance for the nonuniform fast Fourier transforms, or the standard particle-in-cell scheme, combined with temporal coarsening, as coarse propagators. This is different from the typica… ▽ More We propose a parareal based time parallelization scheme in the phase-space for the particle-in-Fourier (PIF) discretization of the Vlasov-Poisson system used in kinetic plasma simulations. We use PIF with a coarse tolerance for the nonuniform fast Fourier transforms, or the standard particle-in-cell scheme, combined with temporal coarsening, as coarse propagators. This is different from the typical spatial coarsening of particles and/or Fourier modes for parareal, which are not possible or effective for PIF schemes. We perform an error analysis of the algorithm and verify the results numerically with Landau damping, two-stream instability, and Penning trap test cases in 3D-3V. We also implement the space-time parallelization of the PIF schemes in the open-source, performance-portable library IPPL and conduct scaling studies up to 1536 A100 GPUs on the JUWELS booster supercomputer. The space-time parallelization utilizing the parareal algorithm for the time parallelization provides up to $4-6$ times speedup compared to spatial parallelization alone and achieves a push rate of around 1 billion particles per second for the benchmark plasma mini-apps considered. △ Less

Submitted 15 June, 2025; v1 submitted 29 June, 2024; originally announced July 2024.

Comments: 26 pages, 12 figures, This is the accepted version in SIAM Journal on Scientific Computing

MSC Class: 35Q83; 65M75; 82D10

arXiv:2405.02603 [pdf, ps, other]

doi 10.1145/3748815

A Massively Parallel Performance Portable Free-space Spectral Poisson Solver

Authors: Sonali Mayani, Veronica Montanaro, Antoine Cerfon, Matthias Frey, Sriramkrishnan Muralikrishnan, Andreas Adelmann

Abstract: Vico et al. (2016) suggest a fast algorithm for computing volume potentials, beneficial to fields with problems requiring the solution of the free-space Poisson's equation, such as beam and plasma physics. Currently, the standard is the algorithm of Hockney and Eastwood (1988), with second order in convergence at best. The algorithm proposed by Vico et al. converges spectrally for sufficiently smo… ▽ More Vico et al. (2016) suggest a fast algorithm for computing volume potentials, beneficial to fields with problems requiring the solution of the free-space Poisson's equation, such as beam and plasma physics. Currently, the standard is the algorithm of Hockney and Eastwood (1988), with second order in convergence at best. The algorithm proposed by Vico et al. converges spectrally for sufficiently smooth functions i.e. faster than any fixed order in the number of grid points. We implement a performance portable version of the traditional Hockney-Eastwood and the novel Vico-Greengard Poisson solver as part of the IPPL (Independent Parallel Particle Layer) library. For sufficiently smooth source functions, the Vico-Greengard algorithm achieves higher accuracy than the Hockney-Eastwood method with the same grid size, reducing the computational demands of high resolution simulations since one could use coarser grids to achieve them. Additionally, we propose an improvement to the Vico-Greengard method which further reduces its memory footprint. This is important for GPUs, which have limited memory, and should be taken into account when selecting numerical algorithms for performance portable codes. Finally, we showcase performance through GPU and CPU scaling studies on the Perlmutter (NERSC) supercomputer, with efficiencies staying above 50% in the strong scaling case. To showcase portability, we also run the scaling studies on the Alps supercomputer at CSCS, Switzerland and the GPU partition of the Lumi supercomputer at CSC, Finland. △ Less

Submitted 29 July, 2025; v1 submitted 4 May, 2024; originally announced May 2024.

Comments: 29 pages, 19 figures

arXiv:2205.11052 [pdf, other]

Scaling and performance portability of the particle-in-cell scheme for plasma physics applications through mini-apps targeting exascale architectures

Authors: Sriramkrishnan Muralikrishnan, Matthias Frey, Alessandro Vinciguerra, Michael Ligotino, Antoine J. Cerfon, Miroslav Stoyanov, Rahulkumar Gayatri, Andreas Adelmann

Abstract: We perform a scaling and performance portability study of the particle-in-cell scheme for plasma physics applications through a set of mini-apps we name "Alpine", which can make use of exascale computing capabilities. The mini-apps are based on Independent Parallel Particle Layer, a framework that is designed around performance portable and dimension independent particles and fields. We benchmar… ▽ More We perform a scaling and performance portability study of the particle-in-cell scheme for plasma physics applications through a set of mini-apps we name "Alpine", which can make use of exascale computing capabilities. The mini-apps are based on Independent Parallel Particle Layer, a framework that is designed around performance portable and dimension independent particles and fields. We benchmark the simulations with varying parameters such as grid resolutions ($512^3$ to $2048^3$) and number of simulation particles ($10^9$ to $10^{11}$) with the following mini-apps: weak and strong Landau damping, bump-on-tail and two-stream instabilities, and the dynamics of an electron bunch in a charge-neutral Penning trap. We show strong and weak scaling and analyze the performance of different components on several pre-exascale architectures such as Piz-Daint, Cori, Summit and Perlmutter. While the scaling and portability study helps identify the performance critical components of the particle-in-cell scheme in the current state-of-the-art computing architectures, the mini-apps by themselves can be used to develop new algorithms and optimize their high performance implementations targeting exascale architectures. △ Less

Submitted 2 November, 2022; v1 submitted 23 May, 2022; originally announced May 2022.

arXiv:2103.09352 [pdf, other]

doi 10.1088/1367-2630/ac5001

Order-of-Magnitude Beam Current Improvement in Compact Cyclotrons

Authors: Daniel Winklehner, Andreas Adelmann, Janet M. Conrad, Sonali Mayani, Sriramkrishnan Muralikrishnan, Devin Schoen, Maria Yampolskaya

Abstract: There is great need for high intensity proton beams from compact particle accelerators in particle physics, medical isotope production, and materials- and energy-research. To address this need, we present, for the first time, a design for a compact isochronous cyclotron that will be able to deliver 10 mA of 60 MeV protons - an order of magnitude higher than on-market compact cyclotrons and a facto… ▽ More There is great need for high intensity proton beams from compact particle accelerators in particle physics, medical isotope production, and materials- and energy-research. To address this need, we present, for the first time, a design for a compact isochronous cyclotron that will be able to deliver 10 mA of 60 MeV protons - an order of magnitude higher than on-market compact cyclotrons and a factor four higher than research machines. A key breakthrough is that vortex motion is incorporated in the design of a cyclotron, leading to clean extraction. Beam losses on the septa of the electrostatic extraction channels stay below 50 W (a factor four below the required safety limit), while maintaining good beam quality. We present a set of highly accurate particle-in-cell simulations, and an uncertainty quantification of select beam input parameters using machine learning, showing the robustness of the design. This design can be utilized for beams for experiments in particle and nuclear physics, materials science and medical physics as well as for industrial applications. △ Less

Submitted 11 May, 2021; v1 submitted 16 March, 2021; originally announced March 2021.

Comments: Submitted to NJP

arXiv:2012.07648 [pdf, other]

A Multilevel Block Preconditioner for the HDG Trace System Applied to Incompressible Resistive MHD

Authors: Sriramkrishnan Muralikrishnan, Stephen Shannon, Tan Bui-Thanh, John N. Shadid

Abstract: We present a scalable block preconditioning strategy for the trace system coming from the high-order hybridized discontinuous Galerkin (HDG) discretization of incompressible resistive magnetohydrodynamics (MHD). We construct the block preconditioner with a least squares commutator (BFBT) approximation for the inverse of the Schur complement that segregates out the pressure unknowns of the trace sy… ▽ More We present a scalable block preconditioning strategy for the trace system coming from the high-order hybridized discontinuous Galerkin (HDG) discretization of incompressible resistive magnetohydrodynamics (MHD). We construct the block preconditioner with a least squares commutator (BFBT) approximation for the inverse of the Schur complement that segregates out the pressure unknowns of the trace system. The remaining velocity, magnetic field, and Lagrange multiplier unknowns form a coupled nodal unknown block (the upper block), for which a system algebraic multigrid (AMG) is used for the approximate inverse. The complexity of the MHD equations together with the algebraic nature of the statically condensed HDG trace system makes the choice of smoother in the system AMG part critical for the convergence and performance of the block preconditioner. Our numerical experiments show GMRES preconditioned by ILU(0) of overlap zero as a smoother inside system AMG performs best in terms of robustness, time per nonlinear iteration and memory requirements. With several transient test cases in 2D and 3D including the island coalescence problem at high Lundquist number we demonstrate the robustness and parallel scalability of the block preconditioner. Additionally for the upper block a preliminary study of an alternate nodal block system solver based on a multilevel approximate nested dissection is presented. On a 2D island coalescence problem the multilevel approximate nested dissection preconditioner shows better scalability with respect to mesh refinement than the system AMG, but is relatively less robust with respect to Lundquist number scaling. △ Less

Submitted 14 December, 2020; originally announced December 2020.

arXiv:2008.09441 [pdf, other]

Sparse Grids based Adaptive Noise Reduction strategy for Particle-In-Cell schemes

Authors: Sriramkrishnan Muralikrishnan, Antoine J. Cerfon, Matthias Frey, Lee F. Ricketson, Andreas Adelmann

Abstract: We propose a sparse grids based adaptive noise reduction strategy for electrostatic particle-in-cell (PIC) simulations. Our approach is based on the key idea of relying on sparse grids instead of a regular grid in order to increase the number of particles per cell for the same total number of particles, as first introduced in Ricketson and Cerfon (Plasma Phys. and Control. Fusion, 59(2), 024002).… ▽ More We propose a sparse grids based adaptive noise reduction strategy for electrostatic particle-in-cell (PIC) simulations. Our approach is based on the key idea of relying on sparse grids instead of a regular grid in order to increase the number of particles per cell for the same total number of particles, as first introduced in Ricketson and Cerfon (Plasma Phys. and Control. Fusion, 59(2), 024002). Adopting a new filtering perspective for this idea, we construct the algorithm so that it can be easily integrated into high performance large-scale PIC code bases. Unlike the physical and Fourier domain filters typically used in PIC codes, our approach automatically adapts to mesh size, number of particles per cell, smoothness of the density profile and the initial sampling technique. Thanks to the truncated combination technique, we can reduce the larger grid-based error of the standard sparse grids approach for non-aligned and non-smooth functions. We propose a heuristic based on formal error analysis for selecting the optimal truncation parameter at each time step, and develop a natural framework to minimize the total error in sparse PIC simulations. We demonstrate its efficiency and performance by means of two test cases: the diocotron instability in two dimensions, and the three-dimensional electron dynamics in a Penning trap. Our run time performance studies indicate that our new scheme can provide significant speedup and memory reduction as compared to regular PIC for achieving comparable accuracy in the charge density deposition. △ Less

Submitted 21 August, 2020; originally announced August 2020.

Showing 1–6 of 6 results for author: Muralikrishnan, S