-
Unlocking True Elasticity for the Cloud-Native Era with Dandelion
Authors:
Tom Kuchler,
Pinghe Li,
Yazhuo Zhang,
Lazar Cvetković,
Boris Goranov,
Tobias Stocker,
Leon Thomm,
Simone Kalbermatter,
Tim Notter,
Andrea Lattuada,
Ana Klimovic
Abstract:
Elasticity is fundamental to cloud computing, as it enables quickly allocating resources to match the demand of each workload as it arrives, rather than pre-provisioning resources to meet performance objectives. However, even serverless platforms -- which boot sandboxes in 10s to 100s of milliseconds -- are not sufficiently elastic to avoid over-provisioning expensive resources. Today's FaaS platf…
▽ More
Elasticity is fundamental to cloud computing, as it enables quickly allocating resources to match the demand of each workload as it arrives, rather than pre-provisioning resources to meet performance objectives. However, even serverless platforms -- which boot sandboxes in 10s to 100s of milliseconds -- are not sufficiently elastic to avoid over-provisioning expensive resources. Today's FaaS platforms rely on pre-provisioning many idle sandboxes in memory to reduce the occurrence of slow, cold starts. A key obstacle for high elasticity is booting a guest OS and configuring features like networking in sandboxes, which are required to expose an isolated POSIX-like interface to user functions. Our key insight is that redesigning the interface for applications in the cloud-native era enables co-designing a much more efficient and elastic execution system. Now is a good time to rethink cloud abstractions as developers are building applications to be cloud-native. Cloud-native applications typically consist of user-provided compute logic interacting with cloud services (for storage, AI inference, query processing, etc) exposed over REST APIs. Hence, we propose Dandelion, an elastic cloud platform with a declarative programming model that expresses applications as DAGs of pure compute functions and higher-level communication functions. Dandelion can securely execute untrusted user compute functions in lightweight sandboxes that cold start in hundreds of microseconds, since pure functions do not rely on extra software environments such as a guest OS. Dandelion makes it practical to boot a sandbox on-demand for each request, decreasing performance variability by two to three orders of magnitude compared to Firecracker and reducing committed memory by 96% on average when running the Azure Functions trace.
△ Less
Submitted 2 May, 2025;
originally announced May 2025.
-
Verus: Verifying Rust Programs using Linear Ghost Types (extended version)
Authors:
Andrea Lattuada,
Travis Hance,
Chanhee Cho,
Matthias Brun,
Isitha Subasinghe,
Yi Zhou,
Jon Howell,
Bryan Parno,
Chris Hawblitzel
Abstract:
The Rust programming language provides a powerful type system that checks linearity and borrowing, allowing code to safely manipulate memory without garbage collection and making Rust ideal for developing low-level, high-assurance systems. For such systems, formal verification can be useful to prove functional correctness properties beyond type safety. This paper presents Verus, an SMT-based tool…
▽ More
The Rust programming language provides a powerful type system that checks linearity and borrowing, allowing code to safely manipulate memory without garbage collection and making Rust ideal for developing low-level, high-assurance systems. For such systems, formal verification can be useful to prove functional correctness properties beyond type safety. This paper presents Verus, an SMT-based tool for formally verifying Rust programs. With Verus, programmers express proofs and specifications using the Rust language, allowing proofs to take advantage of Rust's linear types and borrow checking. We show how this allows proofs to manipulate linearly typed permissions that let Rust code safely manipulate memory, pointers, and concurrent resources. Verus organizes proofs and specifications using a novel mode system that distinguishes specifications, which are not checked for linearity and borrowing, from executable code and proofs, which are checked for linearity and borrowing. We formalize Verus' linearity, borrowing, and modes in a small lambda calculus, for which we prove type safety and termination of specifications and proofs. We demonstrate Verus on a series of examples, including pointer-manipulating code (an xor-based doubly linked list), code with interior mutability, and concurrent code.
△ Less
Submitted 10 March, 2023; v1 submitted 9 March, 2023;
originally announced March 2023.
-
Timestamp tokens: a better coordination primitive for data-processing systems
Authors:
Andrea Lattuada,
Frank McSherry
Abstract:
Distributed data processing systems have advanced through models that expose more and more opportunities for concurrency within a computation. The scheduling of these increasingly sophisticated models has become the bottleneck for improved throughput and reduced latency.
We present a new coordination primitive for dataflow systems, the timestamp token, which minimizes the volume of information s…
▽ More
Distributed data processing systems have advanced through models that expose more and more opportunities for concurrency within a computation. The scheduling of these increasingly sophisticated models has become the bottleneck for improved throughput and reduced latency.
We present a new coordination primitive for dataflow systems, the timestamp token, which minimizes the volume of information shared between the computation and host system, without surrendering precision about concurrency. Several projects have now used timestamp tokens, and were able to explore computational idioms that could not be expressed easily, if at all, in other platforms. Importantly, these projects did not need to design and implement whole systems to support their research.
△ Less
Submitted 12 October, 2022;
originally announced October 2022.
-
Shared Arrangements: practical inter-query sharing for streaming dataflows
Authors:
Frank McSherry,
Andrea Lattuada,
Malte Schwarzkopf,
Timothy Roscoe
Abstract:
Current systems for data-parallel, incremental processing and view maintenance over high-rate streams isolate the execution of independent queries. This creates unwanted redundancy and overhead in the presence of concurrent incrementally maintained queries: each query must independently maintain the same indexed state over the same input streams, and new queries must build this state from scratch…
▽ More
Current systems for data-parallel, incremental processing and view maintenance over high-rate streams isolate the execution of independent queries. This creates unwanted redundancy and overhead in the presence of concurrent incrementally maintained queries: each query must independently maintain the same indexed state over the same input streams, and new queries must build this state from scratch before they can begin to emit their first results. This paper introduces shared arrangements: indexed views of maintained state that allow concurrent queries to reuse the same in-memory state without compromising data-parallel performance and scaling. We implement shared arrangements in a modern stream processor and show order-of-magnitude improvements in query response time and resource consumption for interactive queries against high-throughput streams, while also significantly improving performance in other domains including business analytics, graph processing, and program analysis.
△ Less
Submitted 11 June, 2020; v1 submitted 6 December, 2018;
originally announced December 2018.
-
Megaphone: Latency-conscious state migration for distributed streaming dataflows
Authors:
Moritz Hoffmann,
Andrea Lattuada,
Frank McSherry,
Vasiliki Kalavri,
John Liagouris,
Timothy Roscoe
Abstract:
We design and implement Megaphone, a data migration mechanism for stateful distributed dataflow engines with latency objectives. When compared to existing migration mechanisms, Megaphone has the following differentiating characteristics: (i) migrations can be subdivided to a configurable granularity to avoid latency spikes, and (ii) migrations can be prepared ahead of time to avoid runtime coordin…
▽ More
We design and implement Megaphone, a data migration mechanism for stateful distributed dataflow engines with latency objectives. When compared to existing migration mechanisms, Megaphone has the following differentiating characteristics: (i) migrations can be subdivided to a configurable granularity to avoid latency spikes, and (ii) migrations can be prepared ahead of time to avoid runtime coordination. Megaphone is implemented as a library on an unmodified timely dataflow implementation, and provides an operator interface compatible with its existing APIs. We evaluate Megaphone on established benchmarks with varying amounts of state and observe that compared to naïve approaches Megaphone reduces service latencies during reconfiguration by orders of magnitude without significantly increasing steady-state overhead.
△ Less
Submitted 16 April, 2019; v1 submitted 4 December, 2018;
originally announced December 2018.