-
SIMD-Optimized Search Over Sorted Data
Authors:
Benjamin Mastripolito,
Nicholas Koskelo,
Dylan Weatherred,
David A. Pimentel,
Daniel Sheppard,
Anna Pietarila Graham,
Laura Monroe,
Robert Robey
Abstract:
Applications often require a fast, single-threaded search algorithm over sorted data, typical in table-lookup operations. We explore various search algorithms for a large number of search candidates over a relatively small array of logarithmically-distributed sorted data. These include an innovative hash-based search that takes advantage of floating point representation to bin data by the exponent…
▽ More
Applications often require a fast, single-threaded search algorithm over sorted data, typical in table-lookup operations. We explore various search algorithms for a large number of search candidates over a relatively small array of logarithmically-distributed sorted data. These include an innovative hash-based search that takes advantage of floating point representation to bin data by the exponent. Algorithms that can be optimized to take advantage of SIMD vector instructions are of particular interest. We then conduct a case study applying our results and analyzing algorithmic performance with the EOSPAC package. EOSPAC is a table look-up library for manipulation and interpolation of SESAME equation-of-state data. Our investigation results in a couple of algorithms with better performance with a best case 8x speedup over the original EOSPAC Hunt-and-Locate implementation. Our techniques are generalizable to other instances of search algorithms seeking to get a performance boost from vectorization.
△ Less
Submitted 6 December, 2021;
originally announced December 2021.
-
Performance Analysis of CP2K Code for Ab Initio Molecular Dynamics
Authors:
Dewi Yokelson,
Nikolay V. Tkachenko,
Robert Robey,
Ying Wai Li,
Pavel A. Dub
Abstract:
Using a realistic molecular catalyst system, we conduct scaling studies of ab initio molecular dynamics simulations using the CP2K code on both Intel Xeon CPU and NVIDIA V100 GPU architectures. We explore using process placement and affinity to gain additional performance improvements. We also use statistical methods to understand performance changes in spite of the variability in runtime for each…
▽ More
Using a realistic molecular catalyst system, we conduct scaling studies of ab initio molecular dynamics simulations using the CP2K code on both Intel Xeon CPU and NVIDIA V100 GPU architectures. We explore using process placement and affinity to gain additional performance improvements. We also use statistical methods to understand performance changes in spite of the variability in runtime for each molecular dynamics timestep. We found ideal conditions for CPU runs included at least four MPI ranks per node, bound evenly across each socket, and fully utilizing processing cores with one OpenMP thread per core, no benefit was shown from reserving cores for the system. The CPU-only simulations scaled at 70% or more of the ideal scaling up to 10 compute nodes, after which the returns began to diminish more quickly. Simulations on a single 40-core node with two NVIDIA V100 GPUs for acceleration achieved over 3.7x speedup compared to the fastest single 36-core node CPU-only version, and showed 13% speedup over the fastest time we achieved across five CPU-only nodes.
△ Less
Submitted 9 September, 2021;
originally announced September 2021.
-
ForOpenCL: Transformations Exploiting Array Syntax in Fortran for Accelerator Programming
Authors:
Matthew J. Sottile,
Craig E Rasmussen,
Wayne N. Weseloh,
Robert W. Robey,
Daniel Quinlan,
Jeffrey Overbey
Abstract:
Emerging GPU architectures for high performance computing are well suited to a data-parallel programming model. This paper presents preliminary work examining a programming methodology that provides Fortran programmers with access to these emerging systems. We use array constructs in Fortran to show how this infrequently exploited, standardized language feature is easily transformed to lower-level…
▽ More
Emerging GPU architectures for high performance computing are well suited to a data-parallel programming model. This paper presents preliminary work examining a programming methodology that provides Fortran programmers with access to these emerging systems. We use array constructs in Fortran to show how this infrequently exploited, standardized language feature is easily transformed to lower-level accelerator code. The transformations in ForOpenCL are based on a simple mapping from Fortran to OpenCL. We demonstrate, using a stencil code solving the shallow-water fluid equations, that the performance of the ForOpenCL compiler-generated transformations is comparable with that of hand-optimized OpenCL code.
△ Less
Submitted 11 July, 2011;
originally announced July 2011.