Search | arXiv e-print repository

doi 10.1073/pnas.2122762119

Large Scale Distributed Linear Algebra With Tensor Processing Units

Authors: Adam G. M. Lewis, Jackson Beall, Martin Ganahl, Markus Hauru, Shrestha Basu Mallick, Guifre Vidal

Abstract: We have repurposed Google Tensor Processing Units (TPUs), application-specific chips developed for machine learning, into large-scale dense linear algebra supercomputers. The TPUs' fast inter-core interconnects (ICI)s, physically two-dimensional network topology, and high-bandwidth memory (HBM) permit distributed matrix multiplication algorithms to rapidly become computationally bound. In this reg… ▽ More We have repurposed Google Tensor Processing Units (TPUs), application-specific chips developed for machine learning, into large-scale dense linear algebra supercomputers. The TPUs' fast inter-core interconnects (ICI)s, physically two-dimensional network topology, and high-bandwidth memory (HBM) permit distributed matrix multiplication algorithms to rapidly become computationally bound. In this regime, the matrix-multiply units (MXU)s dominate the runtime, yielding impressive scaling, performance, and raw size: operating in float32 precision, a full 2048-core pod of third generation TPUs can multiply two matrices with linear size $N= 220= 1 048 576$ in about 2 minutes. Via curated algorithms emphasizing large, single-core matrix multiplications, other tasks in dense linear algebra can similarly scale. As examples, we present (i) QR decomposition; (ii) resolution of linear systems; and (iii) the computation of matrix functions by polynomial iteration, demonstrated by the matrix polar factorization. △ Less

Submitted 16 December, 2021; originally announced December 2021.

Comments: 12 pages, 8 figures

arXiv:1911.13282 [pdf, other]

Quantum Computation with Machine-Learning-Controlled Quantum Stuff

Authors: Lucien Hardy, Adam G. M. Lewis

Abstract: We describe how one may go about performing quantum computation with arbitrary "quantum stuff", as long as it has some basic physical properties. Imagine a long strip of stuff, equipped with regularly spaced wires to provide input settings and to read off outcomes. After showing how the corresponding map from settings to outcomes can be construed as a quantum circuit, we provide a machine learning… ▽ More We describe how one may go about performing quantum computation with arbitrary "quantum stuff", as long as it has some basic physical properties. Imagine a long strip of stuff, equipped with regularly spaced wires to provide input settings and to read off outcomes. After showing how the corresponding map from settings to outcomes can be construed as a quantum circuit, we provide a machine learning algorithm to tomographically "learn" which settings implement the members of a universal gate set. At optimum, arbitrary quantum gates, and thus arbitrary quantum programs, can be implemented using the stuff. △ Less

Submitted 29 November, 2019; originally announced November 2019.

Comments: 13 pages, 3 figures, 1 table, 3 algorithms

arXiv:1804.10120 [pdf, other]

Automatic generation of CUDA code performing tensor manipulations using C++ expression templates

Authors: Adam G. M. Lewis, Harald P. Pfeiffer

Abstract: We present a C++ library, TLoops, which uses a hierarchy of expression templates to represent operations upon tensorial quantities in single lines of C++ code that resemble analytic equations. These expressions may be run as-is, but may also be used to emit equivalent low-level C or CUDA code, which either performs the operations more quickly on the CPU, or allows them to be rapidly ported to run… ▽ More We present a C++ library, TLoops, which uses a hierarchy of expression templates to represent operations upon tensorial quantities in single lines of C++ code that resemble analytic equations. These expressions may be run as-is, but may also be used to emit equivalent low-level C or CUDA code, which either performs the operations more quickly on the CPU, or allows them to be rapidly ported to run on NVIDIA GPUs. We detail the expression template and C++-class hierarchy that represents the expressions and which makes automatic code-generation possible. We then present benchmarks of the expression-template code, the automatically generated C code, and the automatically generated CUDA code running on several generations of NVIDIA GPU. △ Less

Submitted 24 April, 2018; originally announced April 2018.

Comments: 46 pages, 5 figures

Showing 1–3 of 3 results for author: Lewis, A G M