Search | arXiv e-print repository

arXiv:2004.06800 [pdf, other]

doi 10.1088/2632-2153/abbd2e

A hybrid classical-quantum workflow for natural language processing

Authors: Lee J. O'Riordan, Myles Doyle, Fabio Baruffa, Venkatesh Kannan

Abstract: Natural language processing (NLP) problems are ubiquitous in classical computing, where they often require significant computational resources to infer sentence meanings. With the appearance of quantum computing hardware and simulators, it is worth developing methods to examine such problems on these platforms. In this manuscript we demonstrate the use of quantum computing models to perform NLP ta… ▽ More Natural language processing (NLP) problems are ubiquitous in classical computing, where they often require significant computational resources to infer sentence meanings. With the appearance of quantum computing hardware and simulators, it is worth developing methods to examine such problems on these platforms. In this manuscript we demonstrate the use of quantum computing models to perform NLP tasks, where we represent corpus meanings, and perform comparisons between sentences of a given structure. We develop a hybrid workflow for representing small and large scale corpus data sets to be encoded, processed, and decoded using a quantum circuit model. In addition, we provide our results showing the efficacy of the method, and release our developed toolkit as an open software suite. △ Less

Submitted 12 April, 2020; originally announced April 2020.

Comments: For associated code, see https://github.com/ICHEC/QNLP

arXiv:2002.08161 [pdf, other]

doi 10.1016/j.future.2020.05.003

Honing and proofing Astrophysical codes on the road to Exascale. Experiences from code modernization on many-core systems

Authors: Salvatore Cielo, Luigi Iapichino, Fabio Baruffa, Matteo Bugli, Christoph Federrath

Abstract: The complexity of modern and upcoming computing architectures poses severe challenges for code developers and application specialists, and forces them to expose the highest possible degree of parallelism, in order to make the best use of the available hardware. The Intel$^{(R)}$ Xeon Phi$^{(TM)}$ of second generation (code-named Knights Landing, henceforth KNL) is the latest many-core system, whic… ▽ More The complexity of modern and upcoming computing architectures poses severe challenges for code developers and application specialists, and forces them to expose the highest possible degree of parallelism, in order to make the best use of the available hardware. The Intel$^{(R)}$ Xeon Phi$^{(TM)}$ of second generation (code-named Knights Landing, henceforth KNL) is the latest many-core system, which implements several interesting hardware features like for example a large number of cores per node (up to 72), the 512 bits-wide vector registers and the high-bandwidth memory. The unique features of KNL make this platform a powerful testbed for modern HPC applications. The performance of codes on KNL is therefore a useful proxy of their readiness for future architectures. In this work we describe the lessons learnt during the optimisation of the widely used codes for computational astrophysics P-Gadget-3, Flash and Echo. Moreover, we present results for the visualisation and analysis tools VisIt and yt. These examples show that modern architectures benefit from code optimisation at different levels, even more than traditional multi-core systems. However, the level of modernisation of typical community codes still needs improvements, for them to fully utilise resources of novel architectures. △ Less

Submitted 19 February, 2020; originally announced February 2020.

Comments: 16 pages, 10 figures, 4 tables. To be published in Future Generation of Computer Systems (FGCS), Special Issue on "On The Road to Exascale II: Advances in High Performance Computing and Simulations"

ACM Class: C.0; D.4.8; J.0; J.2

Journal ref: Future Generation of Computer Systems,Volume 112, November 2020, Pages 93-107

arXiv:2001.10554 [pdf, other]

doi 10.1088/2058-9565/ab8505

Intel Quantum Simulator: A cloud-ready high-performance simulator of quantum circuits

Authors: Gian Giacomo Guerreschi, Justin Hogaboam, Fabio Baruffa, Nicolas P. D. Sawaya

Abstract: Classical simulation of quantum computers will continue to play an essential role in the progress of quantum information science, both for numerical studies of quantum algorithms and for modeling noise and errors. Here we introduce the latest release of Intel Quantum Simulator (IQS), formerly known as qHiPSTER. The high-performance computing (HPC) capability of the software allows users to leverag… ▽ More Classical simulation of quantum computers will continue to play an essential role in the progress of quantum information science, both for numerical studies of quantum algorithms and for modeling noise and errors. Here we introduce the latest release of Intel Quantum Simulator (IQS), formerly known as qHiPSTER. The high-performance computing (HPC) capability of the software allows users to leverage the available hardware resources provided by supercomputers, as well as available public cloud computing infrastructure. To take advantage of the latter platform, together with the distributed simulation of each separate quantum state, IQS allows to subdivide the computational resources to simulate a pool of related circuits in parallel. We highlight the technical implementation of the distributed algorithm and details about the new pool functionality. We also include some basic benchmarks (up to 42 qubits) and performance results obtained using HPC infrastructure. Finally, we use IQS to emulate a scenario in which many quantum devices are running in parallel to implement the quantum approximate optimization algorithm, using particle swarm optimization as the classical subroutine. The results demonstrate that the hyperparameters of this classical optimization algorithm depends on the total number of quantum circuit simulations one has the bandwidth to perform. Intel Quantum Simulator has been released open-source with permissive licensing and is designed to simulate a large number of qubits, to emulate multiple quantum devices running in parallel, and/or to study the effects of decoherence and other hardware errors on calculation results. △ Less

Submitted 5 May, 2020; v1 submitted 28 January, 2020; originally announced January 2020.

Comments: Improved figures and updated link to the GitHub repository

Journal ref: Quantum Sci. Technol. 5, 034007 (2020)

arXiv:1910.07855 [pdf]

Speeding simulation analysis up with yt and Intel Distribution for Python

Authors: Salvatore Cielo, Luigi Iapichino, Fabio Baruffa

Abstract: As modern scientific simulations grow ever more in size and complexity, even their analysis and post-processing becomes increasingly demanding, calling for the use of HPC resources and methods. yt is a parallel, open source post-processing python package for numerical simulations in astrophysics, made popular by its cross-format compatibility, its active community of developers and its integration… ▽ More As modern scientific simulations grow ever more in size and complexity, even their analysis and post-processing becomes increasingly demanding, calling for the use of HPC resources and methods. yt is a parallel, open source post-processing python package for numerical simulations in astrophysics, made popular by its cross-format compatibility, its active community of developers and its integration with several other professional Python instruments. The Intel Distribution for Python enhances yt's performance and parallel scalability, through the optimization of lower-level libraries Numpy and Scipy, which make use of the optimized Intel Math Kernel Library (Intel-MKL) and the Intel MPI library for distributed computing. The library package yt is used for several analysis tasks, including integration of derived quantities, volumetric rendering, 2D phase plots, cosmological halo analysis and production of synthetic X-ray observation. In this paper, we provide a brief tutorial for the installation of yt and the Intel Distribution for Python, and the execution of each analysis task. Compared to the Anaconda python distribution, using the provided solution one can achieve net speedups up to 4.6x on Intel Xeon Scalable processors (codename Skylake). △ Less

Submitted 17 October, 2019; originally announced October 2019.

Comments: 3 pages, 1 figure, published on Intel Parallel Universe Magazine

Journal ref: Issue 38, 2019, p. 27-32

arXiv:1905.10090 [pdf]

doi 10.1109/HPEC.2019.8916576

Deploying AI Frameworks on Secure HPC Systems with Containers

Authors: David Brayford, Sofia Vallecorsa, Atanas Atanasov, Fabio Baruffa, Walter Riviera

Abstract: The increasing interest in the usage of Artificial Intelligence techniques (AI) from the research community and industry to tackle "real world" problems, requires High Performance Computing (HPC) resources to efficiently compute and scale complex algorithms across thousands of nodes. Unfortunately, typical data scientists are not familiar with the unique requirements and characteristics of HPC env… ▽ More The increasing interest in the usage of Artificial Intelligence techniques (AI) from the research community and industry to tackle "real world" problems, requires High Performance Computing (HPC) resources to efficiently compute and scale complex algorithms across thousands of nodes. Unfortunately, typical data scientists are not familiar with the unique requirements and characteristics of HPC environments. They usually develop their applications with high-level scripting languages or frameworks such as TensorFlow and the installation process often requires connection to external systems to download open source software during the build. HPC environments, on the other hand, are often based on closed source applications that incorporate parallel and distributed computing API's such as MPI and OpenMP, while users have restricted administrator privileges, and face security restrictions such as not allowing access to external systems. In this paper we discuss the issues associated with the deployment of AI frameworks in a secure HPC environment and how we successfully deploy AI frameworks on SuperMUC-NG with Charliecloud. △ Less

Submitted 24 May, 2019; originally announced May 2019.

Comments: 6 pages, 2 figures, 2019 IEEE High Performance Extreme Computing Conference

arXiv:1810.04597 [pdf]

ECHO-3DHPC: Advance the performance of astrophysics simulations with code modernization

Authors: Matteo Bugli, Luigi Iapichino, Fabio Baruffa

Abstract: We present recent developments in the parallelization scheme of ECHO-3DHPC, an efficient astrophysical code used in the modelling of relativistic plasmas. With the help of the Intel Software Development Tools, like Fortran compiler and Profile-Guided Optimization (PGO), Intel MPI library, VTune Amplifier and Inspector we have investigated the performance issues and improved the application scalabi… ▽ More We present recent developments in the parallelization scheme of ECHO-3DHPC, an efficient astrophysical code used in the modelling of relativistic plasmas. With the help of the Intel Software Development Tools, like Fortran compiler and Profile-Guided Optimization (PGO), Intel MPI library, VTune Amplifier and Inspector we have investigated the performance issues and improved the application scalability and the time to solution. The node-level performance is improved by $2.3 \times$ and, thanks to the improved threading parallelisation, the hybrid MPI-OpenMP version of the code outperforms the MPI-only, thus lowering the MPI communication overhead. △ Less

Submitted 10 October, 2018; originally announced October 2018.

Comments: 7 pages, 6 figures. Accepted for publication on The Parallel Universe Magazine ( https://software.intel.com/en-us/parallel-universe-magazine )

Journal ref: Parallel Universe Magazine 34 (2018), 49

arXiv:1612.06090 [pdf, other]

doi 10.1109/HPCS.2017.64

Performance Optimisation of Smoothed Particle Hydrodynamics Algorithms for Multi/Many-Core Architectures

Authors: Fabio Baruffa, Luigi Iapichino, Nicolay J. Hammer, Vasileios Karakasis

Abstract: We describe a strategy for code modernisation of Gadget, a widely used community code for computational astrophysics. The focus of this work is on node-level performance optimisation, targeting current multi/many-core IntelR architectures. We identify and isolate a sample code kernel, which is representative of a typical Smoothed Particle Hydrodynamics (SPH) algorithm. The code modifications inclu… ▽ More We describe a strategy for code modernisation of Gadget, a widely used community code for computational astrophysics. The focus of this work is on node-level performance optimisation, targeting current multi/many-core IntelR architectures. We identify and isolate a sample code kernel, which is representative of a typical Smoothed Particle Hydrodynamics (SPH) algorithm. The code modifications include threading parallelism optimisation, change of the data layout into Structure of Arrays (SoA), auto-vectorisation and algorithmic improvements in the particle sorting. We obtain shorter execution time and improved threading scalability both on Intel XeonR ($2.6 \times$ on Ivy Bridge) and Xeon PhiTM ($13.7 \times$ on Knights Corner) systems. First few tests of the optimised code result in $19.1 \times$ faster execution on second generation Xeon Phi (Knights Landing), thus demonstrating the portability of the devised optimisation solutions to upcoming architectures. △ Less

Submitted 10 May, 2017; v1 submitted 19 December, 2016; originally announced December 2016.

Comments: 8 pages, 2 columns, 4 figures, accepted as paper at HPCS Proceedings 2017, IEEE XPLORE

Journal ref: proceedings of the 2017 International Conference on High Performance Computing & Simulation (HPCS 2017), 381

Showing 1–7 of 7 results for author: Baruffa, F