-
FaaS Is Not Enough: Serverless Handling of Burst-Parallel Jobs
Authors:
Daniel Barcelona-Pons,
Aitor Arjona,
Pedro García-López,
Enrique Molina-Giménez,
Stepan Klymonchuk
Abstract:
Function-as-a-Service (FaaS) struggles with burst-parallel jobs due to needing multiple independent invocations to start a job. The lack of a group invocation primitive complicates application development and overlooks crucial aspects like locality and worker communication.
We introduce a new serverless solution designed specifically for burst-parallel jobs. Unlike FaaS, our solution ensures job…
▽ More
Function-as-a-Service (FaaS) struggles with burst-parallel jobs due to needing multiple independent invocations to start a job. The lack of a group invocation primitive complicates application development and overlooks crucial aspects like locality and worker communication.
We introduce a new serverless solution designed specifically for burst-parallel jobs. Unlike FaaS, our solution ensures job-level isolation using a group invocation primitive, allowing large groups of workers to be launched simultaneously. This method optimizes resource allocation by consolidating workers into fewer containers, speeding up their initialization and enhancing locality. Enhanced locality drastically reduces remote communication compared to FaaS, and combined with simultaneity, it enables workers to communicate synchronously via message passing and group collectives. This makes applications that are impractical with FaaS feasible. We implemented our solution on OpenWhisk, providing a communication middleware that efficiently uses locality with zero-copy messaging. Evaluations show that it reduces job invocation and communication latency, resulting in a 2$\times$ speed-up for TeraSort and a 98.5% reduction in remote communication for PageRank (13$\times$ speed-up) compared to traditional FaaS.
△ Less
Submitted 19 July, 2024;
originally announced July 2024.
-
Scaling a Variant Calling Genomics Pipeline with FaaS
Authors:
Aitor Arjona,
Arnau Gabriel-Atienza,
Sara Lanuza-Orna,
Xavier Roca-Canals,
Ayman Bourramouss,
Tyler K. Chafin,
Lucio Marcello,
Paolo Ribeca,
Pedro García-López
Abstract:
With the escalating complexity and volume of genomic data, the capacity of biology institutions' HPC faces limitations. While the Cloud presents a viable solution for short-term elasticity, its intricacies pose challenges for bioinformatics users. Alternatively, serverless computing allows for workload scalability with minimal developer burden. However, porting a scientific application to serverle…
▽ More
With the escalating complexity and volume of genomic data, the capacity of biology institutions' HPC faces limitations. While the Cloud presents a viable solution for short-term elasticity, its intricacies pose challenges for bioinformatics users. Alternatively, serverless computing allows for workload scalability with minimal developer burden. However, porting a scientific application to serverless is not a straightforward process. In this article, we present a Variant Calling genomics pipeline migrated from single-node HPC to a serverless architecture. We describe the inherent challenges of this approach and the engineering efforts required to achieve scalability. We contribute by open-sourcing the pipeline for future systems research and as a scalable user-friendly tool for the bioinformatics community.
△ Less
Submitted 12 December, 2023;
originally announced December 2023.
-
Transparent Serverless execution of Python multiprocessing applications
Authors:
Aitor Arjona,
Gerard Finol,
Pedro Garcia-Lopez
Abstract:
Access transparency means that both local and remote resources are accessed using identical operations. With transparency, unmodified single-machine applications could run over disaggregated compute, storage, and memory resources. Hiding the complexity of distributed systems through transparency would have great benefits, like scaling-out local-parallel scientific applications over flexible disagg…
▽ More
Access transparency means that both local and remote resources are accessed using identical operations. With transparency, unmodified single-machine applications could run over disaggregated compute, storage, and memory resources. Hiding the complexity of distributed systems through transparency would have great benefits, like scaling-out local-parallel scientific applications over flexible disaggregated resources in the Cloud.
This paper presents a performance evaluation where we assess the feasibility of access transparency over state-of-the-art Cloud disaggregated resources for Python multiprocessing applications. We have interfaced the multiprocessing module with an implementation that transparently runs processes on serverless functions and uses an in-memory data store for shared state.
To evaluate transparency, we run in the Cloud four unmodified applications: Uber Research's Evolution Strategies, Baselines-AI's Proximal Policy Optimization, Pandaral.lel's dataframe, and ScikitLearn's Hyperparameter tuning. We compare execution time and scalability of the same application running over disaggregated resources using our library, with the single-machine Python multiprocessing libraries in a large VM. For equal resources, applications efficiently using message-passing abstractions achieve comparable results despite the significant overheads of remote communication. Other shared-memory intensive applications do not perform due to high remote memory latency.
The results show that Python's multiprocessing library design is an enabler towards transparency: legacy applications using efficient disaggregated abstractions can transparently scale beyond VM limited resources for increased parallelism without changing the underlying code or architecture.
△ Less
Submitted 22 November, 2022; v1 submitted 18 May, 2022;
originally announced May 2022.
-
Triggerflow: Trigger-based Orchestration of Serverless Workflows
Authors:
Aitor Arjona,
Pedro García-López,
Josep Sampé,
Aleksander Slominski,
Lionel Villard
Abstract:
As more applications are being moved to the Cloud thanks to serverless computing, it is increasingly necessary to support the native life cycle execution of those applications in the data center. But existing cloud orchestration systems either focus on short-running workflows (like IBM Composer or Amazon Step Functions Express Workflows) or impose considerable overheads for synchronizing massively…
▽ More
As more applications are being moved to the Cloud thanks to serverless computing, it is increasingly necessary to support the native life cycle execution of those applications in the data center. But existing cloud orchestration systems either focus on short-running workflows (like IBM Composer or Amazon Step Functions Express Workflows) or impose considerable overheads for synchronizing massively parallel jobs (Azure Durable Functions, Amazon Step Functions). None of them are open systems enabling extensible interception and optimization of custom workflows. We present Triggerflow: an extensible Trigger-based Orchestration architecture for serverless workflows. We demonstrate that Triggerflow is a novel serverless building block capable of constructing different reactive orchestrators (State Machines, Directed Acyclic Graphs, Workflow as code, Federated Learning orchestrator). We also validate that it can support high-volume event processing workloads, auto-scale on demand with scale down to zero when not used, and transparently guarantee fault tolerance and efficient resource usage when orchestrating long running scientific workflows.
△ Less
Submitted 22 June, 2021; v1 submitted 1 June, 2021;
originally announced June 2021.
-
Triggerflow: Trigger-based Orchestration of Serverless Workflows
Authors:
Pedro García-López,
Aitor Arjona,
Josep Sampe,
Aleksander Slominski,
Lionel Villard
Abstract:
As more applications are being moved to the Cloud thanks to serverless computing, it is increasingly necessary to support native life cycle execution of those applications in the data center. But existing systems either focus on short-running workflows (like IBM Composer or Amazon Express Workflows) or impose considerable overheads for synchronizing massively parallel jobs (Azure Durable Functions…
▽ More
As more applications are being moved to the Cloud thanks to serverless computing, it is increasingly necessary to support native life cycle execution of those applications in the data center. But existing systems either focus on short-running workflows (like IBM Composer or Amazon Express Workflows) or impose considerable overheads for synchronizing massively parallel jobs (Azure Durable Functions, Amazon Step Functions, Google Cloud Composer). None of them are open systems enabling extensible interception and optimization of custom workflows. We present Triggerflow: an extensible Trigger-based Orchestration architecture for serverless workflows built on top of Knative Eventing and Kubernetes technologies. We demonstrate that Triggerflow is a novel serverless building block capable of constructing different reactive schedulers (State Machines, Directed Acyclic Graphs, Workflow as code). We also validate that it can support high-volume event processing workloads, auto-scale on demand and transparently optimize scientific workflows.
△ Less
Submitted 17 June, 2020; v1 submitted 15 June, 2020;
originally announced June 2020.