-
Magnus integrators on multicore CPUs and GPUs
Authors:
N. Auer,
L. Einkemmer,
P. Kandolf,
A. Ostermann
Abstract:
In the present paper we consider numerical methods to solve the discrete Schrödinger equation with a time dependent Hamiltonian (motivated by problems encountered in the study of spin systems). We will consider both short-range interactions, which lead to evolution equations involving sparse matrices, and long-range interactions, which lead to dense matrices. Both of these settings show very diffe…
▽ More
In the present paper we consider numerical methods to solve the discrete Schrödinger equation with a time dependent Hamiltonian (motivated by problems encountered in the study of spin systems). We will consider both short-range interactions, which lead to evolution equations involving sparse matrices, and long-range interactions, which lead to dense matrices. Both of these settings show very different computational characteristics. We use Magnus integrators for time integration and employ a framework based on Leja interpolation to compute the resulting action of the matrix exponential. We consider both traditional Magnus integrators (which are extensively used for these types of problems in the literature) as well as the recently developed commutator-free Magnus integrators and implement them on modern CPU and GPU (graphics processing unit) based systems.
We find that GPUs can yield a significant speed-up (up to a factor of $10$ in the dense case) for these types of problems. In the sparse case GPUs are only advantageous for large problem sizes and the achieved speed-ups are more modest. In most cases the commutator-free variant is superior but especially on the GPU this advantage is rather small. In fact, none of the advantage of commutator-free methods on GPUs (and on multi-core CPUs) is due to the elimination of commutators. This has important consequences for the design of more efficient numerical methods.
△ Less
Submitted 28 March, 2018; v1 submitted 19 September, 2017;
originally announced September 2017.
-
Evaluation of the Partitioned Global Address Space (PGAS) model for an inviscid Euler solver
Authors:
Martina Prugger,
Lukas Einkemmer,
Alexander Ostermann
Abstract:
In this paper we evaluate the performance of Unified Parallel C (which implements the partitioned global address space programming model) using a numerical method that is widely used in fluid dynamics. In order to evaluate the incremental approach to parallelization (which is possible with UPC) and its performance characteristics, we implement different levels of optimization of the UPC code and c…
▽ More
In this paper we evaluate the performance of Unified Parallel C (which implements the partitioned global address space programming model) using a numerical method that is widely used in fluid dynamics. In order to evaluate the incremental approach to parallelization (which is possible with UPC) and its performance characteristics, we implement different levels of optimization of the UPC code and compare it with an MPI parallelization on four different clusters of the Austrian HPC infrastructure (LEO3, LEO3E, VSC2, VSC3) and on an Intel Xeon Phi. We find that UPC is significantly easier to develop in compared to MPI and that the performance achieved is comparable to MPI in most situations. The obtained results show worse performance (on VSC2), competitive performance (on LEO3, LEO3E and VSC3), and superior performance (on the Intel Xeon Phi).
△ Less
Submitted 12 November, 2016; v1 submitted 14 January, 2016;
originally announced January 2016.
-
Fast algorithms for morphological operations using run-length encoded binary images
Authors:
Gregor Ehrensperger,
Alexander Ostermann,
Felix Schwitzer
Abstract:
This paper presents innovative algorithms to efficiently compute erosions and dilations of run-length encoded (RLE) binary images with arbitrary shaped structuring elements. An RLE image is given by a set of runs, where a run is a horizontal concatenation of foreground pixels. The proposed algorithms extract the skeleton of the structuring element and build distance tables of the input image, whic…
▽ More
This paper presents innovative algorithms to efficiently compute erosions and dilations of run-length encoded (RLE) binary images with arbitrary shaped structuring elements. An RLE image is given by a set of runs, where a run is a horizontal concatenation of foreground pixels. The proposed algorithms extract the skeleton of the structuring element and build distance tables of the input image, which are storing the distance to the next background pixel on the left and right hand sides. This information is then used to speed up the calculations of the erosion and dilation operator by enabling the use of techniques which allow to skip the analysis of certain pixels whenever a hit or miss occurs. Additionally the input image gets trimmed during the preprocessing steps on the base of two primitive criteria. Experimental results show the advantages over other algorithms. The source code of our algorithms is available in C++.
△ Less
Submitted 4 April, 2015;
originally announced April 2015.