Search | arXiv e-print repository

doi 10.1007/978-3-030-86359-3_22

High Performance Implementation of Boris Particle Pusher on DPC++. A First Look at oneAPI

Authors: Valentin Volokitin, Alexey Bashinov, Evgeny Efimenko, Arkady Gonoskov, Iosif Meyerov

Abstract: New hardware architectures open up immense opportunities for supercomputer simulations. However, programming techniques for different architectures vary significantly, which leads to the necessity of developing and supporting multiple code versions, each being optimized for specific hardware features. The oneAPI framework, recently introduced by Intel, contains a set of programming tools for the d… ▽ More New hardware architectures open up immense opportunities for supercomputer simulations. However, programming techniques for different architectures vary significantly, which leads to the necessity of developing and supporting multiple code versions, each being optimized for specific hardware features. The oneAPI framework, recently introduced by Intel, contains a set of programming tools for the development of portable codes that can be compiled and fine-tuned for CPUs, GPUs, FPGAs, and accelerators. In this paper, we report on the experience of porting the implementation of Boris particle pusher to oneAPI. Boris particle pusher is one of the most demanding computational stages of the Particle-in-Cell method, which, in particular, is used for supercomputer simulations of laser-plasma interactions. We show how to adapt the C++ implementation of the particle push algorithm from the Hi-Chi project to the DPC++ programming language and report the performance of the code on high-end Intel CPUs (Xeon Platinum 8260L) and Intel GPUs (P630 and Iris Xe Max). It turned out that our C++ code can be easily ported to DPC++. We found that on CPUs the resulting DPC++ code is only ~10% on average inferior to the optimized C++ code. Moreover, the code is compiled and run on new Intel GPUs without any specific optimizations and shows the expected performance, taking into account the parameters of the hardware. △ Less

Submitted 9 April, 2021; originally announced April 2021.

arXiv:2008.10468 [pdf]

doi 10.1088/1742-6596/1640/1/012015

Optimized routines for event generators in QED-PIC codes

Authors: V. Volokitin, S. Bastrakov, A. Bashinov, E. Efimenko, A. Muraviev, A. Gonoskov, I. Meyerov

Abstract: In recent years, the prospects of performing fundamental and applied studies at the next-generation high-intensity laser facilities have greatly stimulated the interest in performing large-scale simulations of laser interaction with matter with the account for quantum electrodynamics (QED) processes such as emission of high energy photons and decay of such photons into electron-positron pairs. The… ▽ More In recent years, the prospects of performing fundamental and applied studies at the next-generation high-intensity laser facilities have greatly stimulated the interest in performing large-scale simulations of laser interaction with matter with the account for quantum electrodynamics (QED) processes such as emission of high energy photons and decay of such photons into electron-positron pairs. These processes can be modeled via probabilistic routines that include frequent computation of synchrotron functions and can constitute significant computational demands within accordingly extended Particle-in-Cell (QED-PIC) algorithms. In this regard, the optimization of these routines is of great interest. In this paper, we propose and describe two modifications. First, we derive a more accurate upper-bound estimate for the rate of QED events and use it to arrange local sub-stepping of the global time step in a significantly more efficient way than done previously. Second, we present a new high-performance implementation of synchrotron functions. Our optimizations made it possible to speed up the computations by a factor of up to 13.7 depending on the problem. Our implementation is integrated into the PICADOR and Hi-Chi codes, the latter of which is distributed publicly (https://github.com/hi-chi/pyHiChi). △ Less

Submitted 24 August, 2020; originally announced August 2020.

Journal ref: J. Phys.: Conf. Ser.1640 012015 (2020)

arXiv:1905.08217 [pdf]

Exploiting Parallelism on Shared Memory in the QED Particle-in-Cell Code PICADOR with Greedy Load Balancing

Authors: Iosif Meyerov, Sergei Bastrakov, Aleksei Bashinov, Evgeny Efimenko, Alexander Panov, Elena Panova, Igor Surmin, Valentin Volokitin, Arkady Gonoskov

Abstract: State-of-the-art numerical simulations of laser plasma by means of the Particle-in-Cell method are often extremely computationally intensive. Therefore there is a growing need for development of approaches for efficient utilization of resources of modern supercomputers. In this paper, we address the problem of a substantially non-uniform and dynamically varying distribution of macroparticles in a… ▽ More State-of-the-art numerical simulations of laser plasma by means of the Particle-in-Cell method are often extremely computationally intensive. Therefore there is a growing need for development of approaches for efficient utilization of resources of modern supercomputers. In this paper, we address the problem of a substantially non-uniform and dynamically varying distribution of macroparticles in a computational area in simulating quantum electrodynamic (QED) cascades. We propose and evaluate a load balancing scheme for shared memory systems, which allows subdividing individual cells of the computational domain into work portions with subsequent dynamic distribution of these portions between OpenMP threads. Computational experiments on 1D, 2D, and 3D QED simulations show that the proposed scheme outperforms the previously developed standard and custom schemes in the PICADOR code by 2.1 to 10 times when employing several Intel Cascade Lake CPUs. △ Less

Submitted 20 May, 2019; originally announced May 2019.

Comments: 11 pages, 5 figures. Submitted to PPAM-2019

arXiv:1608.01009 [pdf, ps, other]

Co-design of a particle-in-cell plasma simulation code for Intel Xeon Phi: a first look at Knights Landing

Authors: Igor Surmin, Sergey Bastrakov, Zakhar Matveev, Evgeny Efimenko, Arkady Gonoskov, Iosif Meyerov

Abstract: Three dimensional particle-in-cell laser-plasma simulation is an important area of computational physics. Solving state-of-the-art problems requires large-scale simulation on a supercomputer using specialized codes. A growing demand in computational resources inspires research in improving efficiency and co-design for supercomputers based on many-core architectures. This paper presents first perfo… ▽ More Three dimensional particle-in-cell laser-plasma simulation is an important area of computational physics. Solving state-of-the-art problems requires large-scale simulation on a supercomputer using specialized codes. A growing demand in computational resources inspires research in improving efficiency and co-design for supercomputers based on many-core architectures. This paper presents first performance results of the particle-in-cell plasma simulation code PICADOR on the recently introduced Knights Landing generation of Intel Xeon Phi. A straightforward rebuilding of the code yields a 2.43 x speedup compared to the previous Knights Corner generation. Further code optimization results in an additional 1.89 x speedup. The optimization performed is beneficial not only for Knights Landing, but also for high-end CPUs and Knights Corner. The optimized version achieves 100 GFLOPS double precision performance on a Knights Landing device with the speedups of 2.35 x compared to a 14-core Haswell CPU and 3.47 x compared to a 61-core Knights Corner Xeon Phi. △ Less

Submitted 2 August, 2016; originally announced August 2016.

Showing 1–4 of 4 results for author: Gonoskov, A