Exploiting Inherent Elasticity of Serverless in Irregular Algorithms
Authors:
Gerard Finol,
Gerard París,
Pedro García-López,
Marc Sánchez-Artigas
Abstract:
Serverless computing, in particular the Function-as-a-Service (FaaS) execution model, has recently shown to be effective for running large-scale computations. However, little attention has been paid to highly-parallel applications with unbalanced and irregular workloads. Typically, these workloads have been kept out of the cloud due to the impossibility of anticipating their computing resources ah…
▽ More
Serverless computing, in particular the Function-as-a-Service (FaaS) execution model, has recently shown to be effective for running large-scale computations. However, little attention has been paid to highly-parallel applications with unbalanced and irregular workloads. Typically, these workloads have been kept out of the cloud due to the impossibility of anticipating their computing resources ahead of time, frequently leading to severe resource over- and underprovisioning situations. Our main insight in this article is, however, that the elasticity and ease of management of serverless computing technology can be a key enabler for effectively running these problematic workloads for the first time in the cloud. More concretely, we demonstrate that with a simple serverless executor pool abstraction one can achieve a better cost-performance trade-off than a Spark cluster of static size built upon large EC2 virtual machines. To support this conclusion, we evaluate three irregular algorithms: Unbalanced Tree Search (UTS), Mandelbrot Set using the Mariani-Silver algorithm and Betweenness Centrality (BC) on a random graph. For instance, our serverless implementation of UTS is able to outperform Spark by up to 55% with the same cost. We also show that a serverless environment can outperform a large EC2 in the BC algorithm by a 10% using the same amount of virtual CPUs. This provides the first concrete evidence that highly-parallel, irregular workloads can be efficiently executed using purely stateless functions with almost zero burden on users i.e., no need for users to understand non-obvious system-level parameters and optimizations. Furthermore, we show that UTS can benefit from the FaaS pay-as-you-go billing model, which makes it worth for the first time to enable certain application-level optimizations that can lead to significant improvements (e.g. of 41%) with negligible increase in cost.
△ Less
Submitted 30 June, 2022;
originally announced June 2022.
Transparent Serverless execution of Python multiprocessing applications
Authors:
Aitor Arjona,
Gerard Finol,
Pedro Garcia-Lopez
Abstract:
Access transparency means that both local and remote resources are accessed using identical operations. With transparency, unmodified single-machine applications could run over disaggregated compute, storage, and memory resources. Hiding the complexity of distributed systems through transparency would have great benefits, like scaling-out local-parallel scientific applications over flexible disagg…
▽ More
Access transparency means that both local and remote resources are accessed using identical operations. With transparency, unmodified single-machine applications could run over disaggregated compute, storage, and memory resources. Hiding the complexity of distributed systems through transparency would have great benefits, like scaling-out local-parallel scientific applications over flexible disaggregated resources in the Cloud.
This paper presents a performance evaluation where we assess the feasibility of access transparency over state-of-the-art Cloud disaggregated resources for Python multiprocessing applications. We have interfaced the multiprocessing module with an implementation that transparently runs processes on serverless functions and uses an in-memory data store for shared state.
To evaluate transparency, we run in the Cloud four unmodified applications: Uber Research's Evolution Strategies, Baselines-AI's Proximal Policy Optimization, Pandaral.lel's dataframe, and ScikitLearn's Hyperparameter tuning. We compare execution time and scalability of the same application running over disaggregated resources using our library, with the single-machine Python multiprocessing libraries in a large VM. For equal resources, applications efficiently using message-passing abstractions achieve comparable results despite the significant overheads of remote communication. Other shared-memory intensive applications do not perform due to high remote memory latency.
The results show that Python's multiprocessing library design is an enabler towards transparency: legacy applications using efficient disaggregated abstractions can transparently scale beyond VM limited resources for increased parallelism without changing the underlying code or architecture.
△ Less
Submitted 22 November, 2022; v1 submitted 18 May, 2022;
originally announced May 2022.