-
Skitter: A Distributed Stream Processing Framework with Pluggable Distribution Strategies
Authors:
Mathijs Saey,
Joeri De Koster,
Wolfgang De Meuter
Abstract:
Context: Distributed Stream Processing Frameworks (DSPFs) are popular tools for expressing real-time Big Data applications that have to handle enormous volumes of data in real time. These frameworks distribute their applications over a cluster in order to scale horizontally along with the amount of incoming data.
Inquiry: Crucial for the performance of such applications is the **distribution str…
▽ More
Context: Distributed Stream Processing Frameworks (DSPFs) are popular tools for expressing real-time Big Data applications that have to handle enormous volumes of data in real time. These frameworks distribute their applications over a cluster in order to scale horizontally along with the amount of incoming data.
Inquiry: Crucial for the performance of such applications is the **distribution strategy** that is used to partition data and computations over the cluster nodes. In some DSPFs, like Apache Spark or Flink, the distribution strategy is hardwired into the framework which can lead to inefficient applications. The other end of the spectrum is offered by Apache Storm, which offers a low-level model wherein programmers can implement their own distribution strategies on a per-application basis to improve efficiency. However, this model conflates distribution and data processing logic, making it difficult to modify either. As a consequence, today's cluster application developers either have to accept the built-in distribution strategies of a high-level framework or accept the complexity of expressing a distribution strategy in Storm's low-level model.
Approach: We propose a novel programming model wherein data processing operations and their distribution strategies are decoupled from one another and where new strategies can be created in a modular fashion.
Knowledge: The introduced language abstractions cleanly separate the data processing and distribution logic of a stream processing application. This enables the expression of stream processing applications in a high-level framework while still retaining the flexibility offered by Storm's low-level model.
Grounding: We implement our programming model as a domain-specific language, called Skitter, and use it to evaluate our approach. Our evaluation shows that Skitter enables the implementation of existing distribution strategies from the state of the art in a modular fashion. Our performance evaluation shows that the strategies implemented in Skitter exhibit the expected performance characteristics and that applications written in Skitter obtain throughput rates in the same order of magnitude as Storm.
Importance: Our work enables developers to select the most performant distribution strategy for each operation in their application, while still retaining the programming model offered by high-level frameworks.
△ Less
Submitted 27 February, 2025;
originally announced February 2025.
-
Reactive Programming without Functions
Authors:
Bjarno Oeyen,
Joeri De Koster,
Wolfgang De Meuter
Abstract:
Context: Reactive programming (RP) is a declarative programming paradigm suitable for expressing the handling of events. It enables programmers to create applications that react automatically to changes over time. Whenever a time-varying signal changes -- e.g. in response to values produced by event stream (e.g., sensor data, user input...) -- the program state is updated automatically in tandem w…
▽ More
Context: Reactive programming (RP) is a declarative programming paradigm suitable for expressing the handling of events. It enables programmers to create applications that react automatically to changes over time. Whenever a time-varying signal changes -- e.g. in response to values produced by event stream (e.g., sensor data, user input...) -- the program state is updated automatically in tandem with that change. This makes RP well-suited for building interactive applications and reactive (soft real-time) systems.
Inquiry: RP Language implementations are often built on top of an existing (host) language as an Embedded Domain Specific Language (EDSL). This results in application code in which reactive code and non-reactive code is inherently entangled. Using a mechanism known as lifting, one usually has access to the full feature set of the (non-reactive) host language in the RP program. However, lifting is also dangerous. First, host code expressed in a Turing-complete language may diverge, resulting in unresponsive programs: i.e. reactive programs that are not actually reactive. Second, the bi-directional integration of reactive and non-reactive code results in a paradigmatic mismatch that, when unchecked, leads to faulty behaviour in programs.
Approach: We propose a new reactive programming language, that has been meticulously designed to be reactive-only. We start with a simple (first-order) model for reactivity, based on reactors (i.e. uninstantiated descriptions of signals and their dependencies) and deployments (i.e. instances of reactors) that consist of signals. The language does not have the notion of functions, and thus unlike other RP languages there is no lifting either. We extend this simple model incrementally with additional features found in other programming languages, RP or otherwise. These features include stateful reactors (that allow for time-based accumulation), signals with dynamic dependencies by means of conditionals and polymorphic deployments, recursively-defined reactors, and (anonymous) reactors with lexical scope.
Knowledge: In our description of these language features, we not only describe the syntax and semantics, but also how each features compares to the problems that exist in (EDSL) RP languages. I.e. by starting from a reactive-only model, we identify which reactive features (that, in other RP languages are typically expressed in non-reactive code) affect the reactive guarantees that can be enforced by the language.
Grounding: We base our arguments by analysing the effect that each feature has on our language: e.g., by analysing how signals are updated, how they are created and how dependencies between signals can be affected. When applicable, we draw parallels with other languages: i.e. similarities shared with other RP languages will be highlighted and thoroughly analysed, and where relevant the same will also be done with non-reactive languages.
Importance: Our language shows how a purely reactive programming is able to express the same kinds of programs as in other RP languages that require the use of (unchecked) functions. By considering reactive programs as a collection of pure (reactive-only) reactors, we aim to increase how reactive programming is comprehended by both language designers and its users.
△ Less
Submitted 4 March, 2024;
originally announced March 2024.
-
Tackling the Awkward Squad for Reactive Programming: The Actor-Reactor Model
Authors:
Sam Van den Vonder,
Thierry Renaux,
Bjarno Oeyen,
Joeri De Koster,
Wolfgang De Meuter
Abstract:
Reactive programming is a programming paradigm whereby programs are internally represented by a dependency graph, which is used to automatically (re)compute parts of a program whenever its input changes. In practice reactive programming can only be used for some parts of an application: a reactive program is usually embedded in an application that is still written in ordinary imperative languages…
▽ More
Reactive programming is a programming paradigm whereby programs are internally represented by a dependency graph, which is used to automatically (re)compute parts of a program whenever its input changes. In practice reactive programming can only be used for some parts of an application: a reactive program is usually embedded in an application that is still written in ordinary imperative languages such as JavaScript or Scala. In this paper we investigate this embedding and we distill "the awkward squad for reactive programming" as 3 concerns that are essential for real-world software development, but that do not fit within reactive programming. They are related to long lasting computations, side-effects, and the coordination between imperative and reactive code. To solve these issues we design a new programming model called the Actor-Reactor Model in which programs are split up in a number of actors and reactors. Actors and reactors enforce a strict separation of imperative and reactive code, and they can be composed via a number of composition operators that make use of data streams. We demonstrate the model via our own implementation in a language called Stella.
△ Less
Submitted 21 June, 2023;
originally announced June 2023.
-
Advanced Join Patterns for the Actor Model based on CEP Techniques
Authors:
Humberto Rodriguez Avila,
Joeri De Koster,
Wolfgang De Meuter
Abstract:
Context: Actor-based programming languages offer many essential features for developing modern distributed reactive systems. These systems exploit the actor model's isolation property to fulfill their performance and scalability demands. Unfortunately, the reliance of the model on isolation as its most fundamental property requires programmers to express complex interaction patterns between their…
▽ More
Context: Actor-based programming languages offer many essential features for developing modern distributed reactive systems. These systems exploit the actor model's isolation property to fulfill their performance and scalability demands. Unfortunately, the reliance of the model on isolation as its most fundamental property requires programmers to express complex interaction patterns between their actors to be expressed manually in terms of complex combinations of messages sent between the isolated actors.
Inquiry: In the last three decades, several language design proposals have been introduced to reduce the complexity that emerges from describing said interaction and coordination of actors. We argue that none of these proposals is satisfactory in order to express the many complex interaction patterns between actors found in modern reactive distributed systems.
Approach: We describe seven smart home automation scenarios (in which an actor represents every smart home appliance) to motivate the support by actor languages for five radically different types of message synchronization patterns, which are lacking in modern distributed actor-based languages. Fortunately, these five types of synchronisation patterns have been studied extensively by the Complex Event Processing (CEP) community. Our paper describes how such CEP patterns are elegantly added to an actor-based programming language.
Knowledge: Based on our findings, we propose an extension of the single-message matching paradigm of contemporary actor-based languages in order to support a multiple-message matching way of thinking in the same way as proposed by CEP languages. Our proposal thus enriches the actor-model by ways of declaratively describing complex message combinations to which an actor can respond.
Grounding: We base the problem-statement of the paper on an online poll in the home automation community that has motivated the real need for the CEP-based synchronisation operators between actors proposed in the paper. Furthermore, we implemented a DSL -- called Sparrow -- that supports said operators and we argue quantitatively (in terms of LOC and in terms of a reduction of the concerns that have to be handled by programmers) that the DSL outperforms existing approaches.
Importance: This work aims to provide a set of synchronization operators that help actor-based languages to handle the complex interaction required by modern reactive distributed systems. To the best of our knowledge, our proposal is the first one to add advanced CEP synchronization operators to the -- relatively simplistic single-message based matching -- mechanisms of most actor-based languages.
△ Less
Submitted 30 October, 2020;
originally announced October 2020.
-
Search-based Tier Assignment for Optimising Offline Availability in Multi-tier Web Applications
Authors:
Laure Philips,
Joeri De Koster,
Wolfgang De Meuter,
Coen De Roover
Abstract:
Web programmers are often faced with several challenges in the development process of modern, rich internet applications. Technologies for the different tiers of the application have to be selected: a server-side language, a combination of JavaScript, HTML and CSS for the client, and a database technology. Meeting the expectations of contemporary web applications requires even more effort from the…
▽ More
Web programmers are often faced with several challenges in the development process of modern, rich internet applications. Technologies for the different tiers of the application have to be selected: a server-side language, a combination of JavaScript, HTML and CSS for the client, and a database technology. Meeting the expectations of contemporary web applications requires even more effort from the developer: many state of the art libraries must be mastered and glued together. This leads to an impedance mismatch problem between the different technologies and it is up to the programmer to align them manually. Multi-tier or tierless programming is a web programming paradigm that provides one language for the different tiers of the web application, allowing the programmer to focus on the actual program logic instead of the accidental complexity that comes from combining several technologies. While current tierless approaches therefore relieve the burden of having to combine different technologies into one application, the distribution of the code is explicitly tied into the program. Certain distribution decisions have an impact on crosscutting concerns such as information security or offline availability. Moreover, adapting the programs such that the application complies better with these concerns often leads to code tangling, rendering the program more difficult to understand and maintain. We introduce an approach to multi-tier programming where the tierless code is decoupled from the tier specification. The developer implements the web application in terms of slices and an external specification that assigns the slices to tiers. A recommender system completes the picture for those slices that do not have a fixed placement and proposes slice refinements as well. This recommender system tries to optimise the tier specification with respect to one or more crosscutting concerns. This is in contrast with current cutting edge solutions that hide distribution decisions from the programmer. In this paper we show that slices, together with a recommender system, enable the developer to experiment with different placements of slices, until the distribution of the code satisfies the programmer's needs. We present a search-based recommender system that maximises the offline availability of a web application and a concrete implementation of these concepts in the tier-splitting tool Stip.js.
△ Less
Submitted 4 December, 2017;
originally announced December 2017.
-
Towards Composable Concurrency Abstractions
Authors:
Janwillem Swalens,
Stefan Marr,
Joeri De Koster,
Tom Van Cutsem
Abstract:
In the past decades, many different programming models for managing concurrency in applications have been proposed, such as the actor model, Communicating Sequential Processes, and Software Transactional Memory. The ubiquity of multi-core processors has made harnessing concurrency even more important. We observe that modern languages, such as Scala, Clojure, or F#, provide not one, but multiple co…
▽ More
In the past decades, many different programming models for managing concurrency in applications have been proposed, such as the actor model, Communicating Sequential Processes, and Software Transactional Memory. The ubiquity of multi-core processors has made harnessing concurrency even more important. We observe that modern languages, such as Scala, Clojure, or F#, provide not one, but multiple concurrency models that help developers manage concurrency. Large end-user applications are rarely built using just a single concurrency model. Programmers need to manage a responsive UI, deal with file or network I/O, asynchronous workflows, and shared resources. Different concurrency models facilitate different requirements. This raises the issue of how these concurrency models interact, and whether they are composable. After all, combining different concurrency models may lead to subtle bugs or inconsistencies.
In this paper, we perform an in-depth study of the concurrency abstractions provided by the Clojure language. We study all pairwise combinations of the abstractions, noting which ones compose without issues, and which do not. We make an attempt to abstract from the specifics of Clojure, identifying the general properties of concurrency models that facilitate or hinder composition.
△ Less
Submitted 13 June, 2014;
originally announced June 2014.