-
Efficient GPU Implementation of Affine Index Permutations on Arrays
Authors:
Mathis Bouverot-Dupuis,
Mary Sheeran
Abstract:
Optimal usage of the memory system is a key element of fast GPU algorithms. Unfortunately many common algorithms fail in this regard despite exhibiting great regularity in memory access patterns. In this paper we propose efficient kernels to permute the elements of an array. We handle a class of permutations known as Bit Matrix Multiply Complement (BMMC) permutations, for which we design kernels o…
▽ More
Optimal usage of the memory system is a key element of fast GPU algorithms. Unfortunately many common algorithms fail in this regard despite exhibiting great regularity in memory access patterns. In this paper we propose efficient kernels to permute the elements of an array. We handle a class of permutations known as Bit Matrix Multiply Complement (BMMC) permutations, for which we design kernels of speed comparable to that of a simple array copy. This is a first step towards implementing a set of array combinators based on these permutations.
△ Less
Submitted 17 July, 2023; v1 submitted 13 June, 2023;
originally announced June 2023.
-
Synchron -- An API and Runtime for Embedded Systems
Authors:
Abhiroop Sarkar,
Bo Joel Svensson,
Mary Sheeran
Abstract:
Programming embedded systems applications involve writing concurrent, event-driven and timing-aware programs. Traditionally, such programs are written in low-level machine-oriented programming languages like C or Assembly. We present an alternative by introducing Synchron, an API that offers high-level abstractions to the programmer while supporting the low-level infrastructure in an associated ru…
▽ More
Programming embedded systems applications involve writing concurrent, event-driven and timing-aware programs. Traditionally, such programs are written in low-level machine-oriented programming languages like C or Assembly. We present an alternative by introducing Synchron, an API that offers high-level abstractions to the programmer while supporting the low-level infrastructure in an associated runtime system and one-time-effort drivers. Embedded systems applications exhibit the general characteristics of being (i) concurrent, (ii) I/O-bound and (iii) timing-aware. To address each of these concerns, the Synchron API consists of three components: (1) a Concurrent ML (CML) inspired message-passing concurrency model, (2) a message-passing--based I/O interface that translates between low-level interrupt based and memory-mapped peripherals, and (3) a timing operator, $syncT$, that marries CML's $sync$ operator with timing windows inspired from the TinyTimber kernel. We implement the Synchron API as the bytecode instructions of a virtual machine called SynchronVM. SynchronVM hosts a Caml-inspired functional language as its frontend language, and the backend of the VM supports the STM32F4 and NRF52 microcontrollers, with RAM in the order of hundreds of kilobytes. We illustrate the expressiveness of the Synchron API by showing examples of expressing state machines commonly found in embedded systems. The timing functionality is demonstrated through a music programming exercise. Finally, we provide benchmarks on the response time, jitter rates, memory, and power usage of the SynchronVM.
△ Less
Submitted 6 May, 2022;
originally announced May 2022.
-
Higher-Order Concurrency for Microcontrollers
Authors:
Abhiroop Sarkar,
Robert Krook,
Bo Joel Svensson,
Mary Sheeran
Abstract:
Programming microcontrollers involves low-level interfacing with hardware and peripherals that are concurrent and reactive. Such programs are typically written in a mixture of C and assembly using concurrent language extensions (like $\texttt{FreeRTOS tasks}$ and $\texttt{semaphores}$), resulting in unsafe, callback-driven, error-prone and difficult-to-maintain code.
We address this challenge by…
▽ More
Programming microcontrollers involves low-level interfacing with hardware and peripherals that are concurrent and reactive. Such programs are typically written in a mixture of C and assembly using concurrent language extensions (like $\texttt{FreeRTOS tasks}$ and $\texttt{semaphores}$), resulting in unsafe, callback-driven, error-prone and difficult-to-maintain code.
We address this challenge by introducing $\texttt{SenseVM}$ - a bytecode-interpreted virtual machine that provides a message-passing based $\textit{higher-order concurrency}$ model, originally introduced by Reppy, for microcontroller programming. This model treats synchronous operations as first-class values (called $\texttt{Events}$) akin to the treatment of first-class functions in functional languages. This primarily allows the programmer to compose and tailor their own concurrency abstractions and, additionally, abstracts away unsafe memory operations, common in shared-memory concurrency models, thereby making microcontroller programs safer, composable and easier-to-maintain.
Our VM is made portable via a low-level $\textit{bridge}$ interface, built atop the embedded OS - Zephyr. The bridge is implemented by all drivers and designed such that programming in response to a software message or a hardware interrupt remains uniform and indistinguishable. In this paper we demonstrate the features of our VM through an example, written in a Caml-like functional language, running on the $\texttt{nRF52840}$ and $\texttt{STM32F4}$ microcontrollers.
△ Less
Submitted 1 September, 2021; v1 submitted 17 August, 2021;
originally announced August 2021.
-
Hailstorm : A Statically-Typed, Purely Functional Language for IoT Applications
Authors:
Abhiroop Sarkar,
Mary Sheeran
Abstract:
With the growing ubiquity of Internet of Things(IoT), more complex logic is being programmed on resource-constrained IoT devices, almost exclusively using the C programming language. While C provides low-level control over memory, it lacks a number of high-level programming abstractions such as higher-order functions, polymorphism, strong static typing, memory safety, and automatic memory manageme…
▽ More
With the growing ubiquity of Internet of Things(IoT), more complex logic is being programmed on resource-constrained IoT devices, almost exclusively using the C programming language. While C provides low-level control over memory, it lacks a number of high-level programming abstractions such as higher-order functions, polymorphism, strong static typing, memory safety, and automatic memory management.
We present Hailstorm, a statically-typed, purely functional programming language that attempts to address the above problem. It is a high-level programming language with a strict typing discipline. It supports features like higher-order functions, tail-recursion, and automatic memory management, to program IoT devices in a declarative manner. Applications running on these devices tend to be heavily dominated by I/O. Hailstorm tracks side effects likeI/O in its type system using resource types. This choice allowed us to explore the design of a purely functional standalone language, in an area where it is more common to embed a functional core in an imperative shell. The language borrows the combinators of arrowized FRP, but has discrete-time semantics. The design of the full set of combinators is work in progress, driven by examples. So far, we have evaluated Hailstorm by writing standard examples from the literature (earthquake detection, a railway crossing system and various other clocked systems), and also running examples on the GRiSP embedded systems board, through generation of Erlang.
△ Less
Submitted 27 May, 2021;
originally announced May 2021.