ExaHyPE: An Engine for Parallel Dynamically Adaptive Simulations of Wave Problems
Authors:
Anne Reinarz,
Dominic E. Charrier,
Michael Bader,
Luke Bovard,
Michael Dumbser,
Kenneth Duru,
Francesco Fambri,
Alice-Agnes Gabriel,
Jean-Matthieu Gallard,
Sven Köppel,
Lukas Krenz,
Leonhard Rannabauer,
Luciano Rezzolla,
Philipp Samfass,
Maurizio Tavelli,
Tobias Weinzierl
Abstract:
ExaHyPE ("An Exascale Hyperbolic PDE Engine") is a software engine for solving systems of first-order hyperbolic partial differential equations (PDEs). Hyperbolic PDEs are typically derived from the conservation laws of physics and are useful in a wide range of application areas. Applications powered by ExaHyPE can be run on a student's laptop, but are also able to exploit thousands of processor c…
▽ More
ExaHyPE ("An Exascale Hyperbolic PDE Engine") is a software engine for solving systems of first-order hyperbolic partial differential equations (PDEs). Hyperbolic PDEs are typically derived from the conservation laws of physics and are useful in a wide range of application areas. Applications powered by ExaHyPE can be run on a student's laptop, but are also able to exploit thousands of processor cores on state-of-the-art supercomputers. The engine is able to dynamically increase the accuracy of the simulation using adaptive mesh refinement where required. Due to the robustness and shock capturing abilities of ExaHyPE's numerical methods, users of the engine can simulate linear and non-linear hyperbolic PDEs with very high accuracy. Users can tailor the engine to their particular PDE by specifying evolved quantities, fluxes, and source terms. A complete simulation code for a new hyperbolic PDE can often be realised within a few hours - a task that, traditionally, can take weeks, months, often years for researchers starting from scratch. In this paper, we showcase ExaHyPE's workflow and capabilities through real-world scenarios from our two main application areas: seismology and astrophysics.
△ Less
Submitted 18 May, 2020; v1 submitted 20 May, 2019;
originally announced May 2019.
Studies on the energy and deep memory behaviour of a cache-oblivious, task-based hyperbolic PDE solver
Authors:
Dominic E. Charrier,
Benjamin Hazelwood,
Ekaterina Tutlyaeva,
Michael Bader,
Michael Dumbser,
Andrey Kudryavtsev,
Alexander Moskovsky,
Tobias Weinzierl
Abstract:
We study the performance behaviour of a seismic simulation using the ExaHyPE engine with a specific focus on memory characteristics and energy needs. ExaHyPE combines dynamically adaptive mesh refinement (AMR) with ADER-DG. It is parallelized using tasks, and it is cache efficient. AMR plus ADER-DG yields a task graph which is highly dynamic in nature and comprises both arithmetically expensive ta…
▽ More
We study the performance behaviour of a seismic simulation using the ExaHyPE engine with a specific focus on memory characteristics and energy needs. ExaHyPE combines dynamically adaptive mesh refinement (AMR) with ADER-DG. It is parallelized using tasks, and it is cache efficient. AMR plus ADER-DG yields a task graph which is highly dynamic in nature and comprises both arithmetically expensive tasks and tasks which challenge the memory's latency. The expensive tasks and thus the whole code benefit from AVX vectorization, though we suffer from memory access bursts. A frequency reduction of the chip improves the code's energy-to-solution. Yet, it does not mitigate burst effects. The bursts' latency penalty becomes worse once we add Intel Optane technology, increase the core count significantly, or make individual, computationally heavy tasks fall out of close caches. Thread overbooking to hide away these latency penalties contra-productive with non-inclusive caches as it destroys the cache and vectorization character. In cases where memory-intense and computationally expensive tasks overlap, ExaHyPE's cache-oblivious implementation can exploit deep, non-inclusive, heterogeneous memory effectively, as main memory misses arise infrequently and slow down only few cores. We thus propose that upcoming supercomputing simulation codes with dynamic, inhomogeneous task graphs are actively supported by thread runtimes in intermixing tasks of different compute character, and we propose that future hardware actively allows codes to downclock the cores running particular task types.
△ Less
Submitted 25 March, 2019; v1 submitted 9 October, 2018;
originally announced October 2018.