-
Comparing Parallel Functional Array Languages: Programming and Performance
Authors:
David van Balen,
Tiziano De Matteis,
Clemens Grelck,
Troels Henriksen,
Aaron W. Hsu,
Gabriele K. Keller,
Thomas Koopman,
Trevor L. McDonell,
Cosmin Oancea,
Sven-Bodo Scholz,
Artjoms Sinkarovs,
Tom Smeding,
Phil Trinder,
Ivo Gabe de Wolff,
Alexandros Nikolaos Ziogas
Abstract:
Parallel functional array languages are an emerging class of programming languages that promise to combine low-effort parallel programming with good performance and performance portability. We systematically compare the designs and implementations of five different functional array languages: Accelerate, APL, DaCe, Futhark, and SaC. We demonstrate the expressiveness of functional array programming…
▽ More
Parallel functional array languages are an emerging class of programming languages that promise to combine low-effort parallel programming with good performance and performance portability. We systematically compare the designs and implementations of five different functional array languages: Accelerate, APL, DaCe, Futhark, and SaC. We demonstrate the expressiveness of functional array programming by means of four challenging benchmarks, namely N-body simulation, MultiGrid, Quickhull, and Flash Attention. These benchmarks represent a range of application domains and parallel computational models. We argue that the functional array code is much shorter and more comprehensible than the hand-optimized baseline implementations because it omits architecture-specific aspects. Instead, the language implementations generate both multicore and GPU executables from a single source code base. Hence, we further argue that functional array code could more easily be ported to, and optimized for, new parallel architectures than conventional implementations of numerical kernels. We demonstrate this potential by reporting the performance of the five parallel functional array languages on a total of 39 instances of the four benchmarks on both a 32-core AMD EPYC 7313 multicore system and on an NVIDIA A30 GPU. We explore in-depth why each language performs well or not so well on each benchmark and architecture. We argue that the results demonstrate that mature functional array languages have the potential to deliver performance competitive with the best available conventional techniques.
△ Less
Submitted 13 May, 2025;
originally announced May 2025.
-
Neural Network Verification is a Programming Language Challenge
Authors:
Lucas C. Cordeiro,
Matthew L. Daggitt,
Julien Girard-Satabin,
Omri Isac,
Taylor T. Johnson,
Guy Katz,
Ekaterina Komendantskaya,
Augustin Lemesle,
Edoardo Manino,
Artjoms Šinkarovs,
Haoze Wu
Abstract:
Neural network verification is a new and rapidly developing field of research. So far, the main priority has been establishing efficient verification algorithms and tools, while proper support from the programming language perspective has been considered secondary or unimportant. Yet, there is mounting evidence that insights from the programming language community may make a difference in the futu…
▽ More
Neural network verification is a new and rapidly developing field of research. So far, the main priority has been establishing efficient verification algorithms and tools, while proper support from the programming language perspective has been considered secondary or unimportant. Yet, there is mounting evidence that insights from the programming language community may make a difference in the future development of this domain. In this paper, we formulate neural network verification challenges as programming language challenges and suggest possible future solutions.
△ Less
Submitted 30 January, 2025; v1 submitted 10 January, 2025;
originally announced January 2025.
-
Correctness is Demanding, Performance is Frustrating
Authors:
Artjoms Sinkarovs,
Thomas Koopman,
Sven-Bodo Scholz
Abstract:
In this paper we demonstrate a technique for developing high performance applications with strong correctness guarantees. We use a theorem prover to derive a high-level specification of the application that includes correctness invariants of our choice. After that, within the same theorem prover, we implement an extraction of the specified application into a high-performance language of our choice…
▽ More
In this paper we demonstrate a technique for developing high performance applications with strong correctness guarantees. We use a theorem prover to derive a high-level specification of the application that includes correctness invariants of our choice. After that, within the same theorem prover, we implement an extraction of the specified application into a high-performance language of our choice. Concretely, we are using Agda to specify a framework for automatic differentiation (reverse mode) that is focused on index-safe tensors. This framework comes with an optimiser for tensor expressions and the ability to translate these expressions into SaC and C. We specify a canonical convolutional neural network within the proposed framework, compute the derivatives needed for the training phase and then demonstrate that the generated code matches the performance of hand-written code when running on a multi-core machine.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
Choosing is Losing: How to combine the benefits of shallow and deep embeddings through reflection
Authors:
Artjoms Šinkarovs,
Jesper Cockx
Abstract:
Dependently-typed host languages empower users to verify a wide range of properties of embedded languages and programs written in them. Designers of such embedded languages are faced with a difficult choice between using a shallow or a deep embedding. The former is easier to use because the entire infrastructure of the host langauge is immediately available. Meanwhile, the latter gives full access…
▽ More
Dependently-typed host languages empower users to verify a wide range of properties of embedded languages and programs written in them. Designers of such embedded languages are faced with a difficult choice between using a shallow or a deep embedding. The former is easier to use because the entire infrastructure of the host langauge is immediately available. Meanwhile, the latter gives full access to the structure of embedded programs, but is difficult to use in practice, especially when the embedded language is itself dependently typed.
The main insight presented in this paper is that the choice between shallow and deep embedding can be eliminated by working in a host language with reflection capabilities: we start from a shallow embedding that can use all libraries and tools of the host language, and later use reflection to expose the deep structure of the embedded programs.
Concretely, we apply this technique to embed three programming languages -- Kaleidoscope, SaC, and (a subset of) APL -- into the dependently typed theorem prover Agda, using dependent types to statically enforce several properties of interest. We then use Agda's reflection capabilities to extract the embedded programs back into the original language, so that the existing toolchain can be leveraged.
In this process, statically verified properties of the host language are mapped onto runtime checks in the target language, allowing extracted programs to interact safely with existing code.
Finally, we demonstrate the feasibility of our approach with the implementation and extraction of a convolutional neural network in our embedding of APL.\@
△ Less
Submitted 22 May, 2021;
originally announced May 2021.
-
Array Languages Make Neural Networks Fast
Authors:
Artjoms Šinkarovs,
Hans-Nikolai Vießmann,
Sven-Bodo Scholz
Abstract:
Modern machine learning frameworks are complex: they are typically organised in multiple layers each of which is written in a different language and they depend on a number of external libraries, but at their core they mainly consist of tensor operations. As array-oriented languages provide perfect abstractions to implement tensor operations, we consider a minimalistic machine learning framework t…
▽ More
Modern machine learning frameworks are complex: they are typically organised in multiple layers each of which is written in a different language and they depend on a number of external libraries, but at their core they mainly consist of tensor operations. As array-oriented languages provide perfect abstractions to implement tensor operations, we consider a minimalistic machine learning framework that is shallowly embedded in an array-oriented language and we study its productivity and performance. We do this by implementing a state of the art Convolutional Neural Network (CNN) and compare it against implementations in TensorFlow and PyTorch --- two state of the art industrial-strength frameworks. It turns out that our implementation is 2 and 3 times faster, even after fine-tuning the TensorFlow and PyTorch to our hardware --- a 64-core GPU-accelerated machine. The size of all three CNN specifications is the same, about 150 lines of code. Our mini framework is 150 lines of highly reusable hardware-agnostic code that does not depend on external libraries. The compiler for a host array language automatically generates parallel code for a chosen architecture. The key to such a balance between performance and portability lies in the design of the array language; in particular, the ability to express rank-polymorphic operations concisely, yet being able to do optimisations across them. This design builds on very few assumptions, and it is readily transferable to other contexts offering a clean approach to high-performance machine learning.
△ Less
Submitted 11 December, 2019;
originally announced December 2019.
-
A Lambda Calculus for Transfinite Arrays: Unifying Arrays and Streams
Authors:
Artjoms Sinkarovs,
Sven-Bodo Scholz
Abstract:
Array programming languages allow for concise and generic formulations of numerical algorithms, thereby providing a huge potential for program optimisation such as fusion, parallelisation, etc. One of the restrictions that these languages typically have is that the number of elements in every array has to be finite. This means that implementing streaming algorithms in such languages requires new t…
▽ More
Array programming languages allow for concise and generic formulations of numerical algorithms, thereby providing a huge potential for program optimisation such as fusion, parallelisation, etc. One of the restrictions that these languages typically have is that the number of elements in every array has to be finite. This means that implementing streaming algorithms in such languages requires new types of data structures, with operations that are not immediately compatible with existing array operations or compiler optimisations.
In this paper, we propose a design for a functional language that natively supports infinite arrays. We use ordinal numbers to introduce the notion of infinity in shapes and indices. By doing so, we obtain a calculus that naturally extends existing array calculi and, at the same time, allows for recursive specifications as they are found in stream- and list-based settings. Furthermore, the main language construct that can be thought of as an $n$-fold cons operator gives rise to expressing transfinite recursion in data, something that lists or streams usually do not support. This makes it possible to treat the proposed calculus as a unifying theory of arrays, lists and streams. We give an operational semantics of the proposed language, discuss design choices that we have made, and demonstrate its expressibility with several examples. We also demonstrate that the proposed formalism preserves a number of well-known universal equalities from array/list/stream theories, and discuss implementation-related challenges.
△ Less
Submitted 10 October, 2017;
originally announced October 2017.