-
Power Ramp-Rate Control via Power Regulation for Storageless Grid-Connected Photovoltaic Systems
Authors:
Jose Miguel Riquelme-Dominguez,
Francisco de Paula García-López,
Sergio Martinez
Abstract:
Photovoltaic Power Ramp-Rate Control (PRRC) constitutes a key ancillary service for future power systems. Although its implementation through the installation of storage systems or irradiance sensors has been widely investigated, fewer studies have explored the power curtailment approach. The latter lacks efficiency, as it voluntarily produces power discharges, yet it is a cost-effective solution…
▽ More
Photovoltaic Power Ramp-Rate Control (PRRC) constitutes a key ancillary service for future power systems. Although its implementation through the installation of storage systems or irradiance sensors has been widely investigated, fewer studies have explored the power curtailment approach. The latter lacks efficiency, as it voluntarily produces power discharges, yet it is a cost-effective solution in terms of capital expenditures. This paper proposes a novel storageless and sensorless photovoltaic PRRC for grid-connected applications in which the photovoltaic power, rather than the voltage, is the controlled magnitude. The aforementioned contribution makes the effective tracking of the power ramp-rate limit possible compared to the existing methods in the literature. The method is assisted by a real-time curve-fitting algorithm that estimates the Maximum Power Point while operating suboptimally. Thus, no direct temperature or irradiance measurement systems are needed. The validation of the proposed PRRC strategy has been tested by simulation and compared to another approach available in the literature, considering real-field highly variable irradiance data. Experimental validation of the proposed strategy has been performed in real time via Controller Hardware-in-the-Loop.
△ Less
Submitted 20 January, 2025;
originally announced January 2025.
-
FaaS Is Not Enough: Serverless Handling of Burst-Parallel Jobs
Authors:
Daniel Barcelona-Pons,
Aitor Arjona,
Pedro García-López,
Enrique Molina-Giménez,
Stepan Klymonchuk
Abstract:
Function-as-a-Service (FaaS) struggles with burst-parallel jobs due to needing multiple independent invocations to start a job. The lack of a group invocation primitive complicates application development and overlooks crucial aspects like locality and worker communication.
We introduce a new serverless solution designed specifically for burst-parallel jobs. Unlike FaaS, our solution ensures job…
▽ More
Function-as-a-Service (FaaS) struggles with burst-parallel jobs due to needing multiple independent invocations to start a job. The lack of a group invocation primitive complicates application development and overlooks crucial aspects like locality and worker communication.
We introduce a new serverless solution designed specifically for burst-parallel jobs. Unlike FaaS, our solution ensures job-level isolation using a group invocation primitive, allowing large groups of workers to be launched simultaneously. This method optimizes resource allocation by consolidating workers into fewer containers, speeding up their initialization and enhancing locality. Enhanced locality drastically reduces remote communication compared to FaaS, and combined with simultaneity, it enables workers to communicate synchronously via message passing and group collectives. This makes applications that are impractical with FaaS feasible. We implemented our solution on OpenWhisk, providing a communication middleware that efficiently uses locality with zero-copy messaging. Evaluations show that it reduces job invocation and communication latency, resulting in a 2$\times$ speed-up for TeraSort and a 98.5% reduction in remote communication for PageRank (13$\times$ speed-up) compared to traditional FaaS.
△ Less
Submitted 19 July, 2024;
originally announced July 2024.
-
Scaling a Variant Calling Genomics Pipeline with FaaS
Authors:
Aitor Arjona,
Arnau Gabriel-Atienza,
Sara Lanuza-Orna,
Xavier Roca-Canals,
Ayman Bourramouss,
Tyler K. Chafin,
Lucio Marcello,
Paolo Ribeca,
Pedro García-López
Abstract:
With the escalating complexity and volume of genomic data, the capacity of biology institutions' HPC faces limitations. While the Cloud presents a viable solution for short-term elasticity, its intricacies pose challenges for bioinformatics users. Alternatively, serverless computing allows for workload scalability with minimal developer burden. However, porting a scientific application to serverle…
▽ More
With the escalating complexity and volume of genomic data, the capacity of biology institutions' HPC faces limitations. While the Cloud presents a viable solution for short-term elasticity, its intricacies pose challenges for bioinformatics users. Alternatively, serverless computing allows for workload scalability with minimal developer burden. However, porting a scientific application to serverless is not a straightforward process. In this article, we present a Variant Calling genomics pipeline migrated from single-node HPC to a serverless architecture. We describe the inherent challenges of this approach and the engineering efforts required to achieve scalability. We contribute by open-sourcing the pipeline for future systems research and as a scalable user-friendly tool for the bioinformatics community.
△ Less
Submitted 12 December, 2023;
originally announced December 2023.
-
A milestone for FaaS pipelines; object storage vs VM-driven data exchange
Authors:
Germán T. Eizaguirre,
Marc Sánchez-Artigas,
Pedro García-López
Abstract:
Serverless functions provide high levels of parallelism, short startup times, and "pay-as-you-go" billing. These attributes make them a natural substrate for data analytics workflows. However, the impossibility of direct communication between functions makes the execution of workflows challenging. The current practice to share intermediate data among functions is through remote object storage (e.g…
▽ More
Serverless functions provide high levels of parallelism, short startup times, and "pay-as-you-go" billing. These attributes make them a natural substrate for data analytics workflows. However, the impossibility of direct communication between functions makes the execution of workflows challenging. The current practice to share intermediate data among functions is through remote object storage (e.g., IBM COS). Contrary to conventional wisdom, the performance of object storage is not well understood. For instance, object storage can even be superior to other simpler approaches like the execution of shuffle stages (e.g., GroupBy) inside powerful VMs to avoid all-to-all transfers between functions. Leveraging a genomics pipeline, we show that object storage is a reasonable choice for data passing when the appropriate number of functions is used in shuffling stages.
△ Less
Submitted 22 June, 2022;
originally announced July 2022.
-
Exploiting Inherent Elasticity of Serverless in Irregular Algorithms
Authors:
Gerard Finol,
Gerard París,
Pedro García-López,
Marc Sánchez-Artigas
Abstract:
Serverless computing, in particular the Function-as-a-Service (FaaS) execution model, has recently shown to be effective for running large-scale computations. However, little attention has been paid to highly-parallel applications with unbalanced and irregular workloads. Typically, these workloads have been kept out of the cloud due to the impossibility of anticipating their computing resources ah…
▽ More
Serverless computing, in particular the Function-as-a-Service (FaaS) execution model, has recently shown to be effective for running large-scale computations. However, little attention has been paid to highly-parallel applications with unbalanced and irregular workloads. Typically, these workloads have been kept out of the cloud due to the impossibility of anticipating their computing resources ahead of time, frequently leading to severe resource over- and underprovisioning situations. Our main insight in this article is, however, that the elasticity and ease of management of serverless computing technology can be a key enabler for effectively running these problematic workloads for the first time in the cloud. More concretely, we demonstrate that with a simple serverless executor pool abstraction one can achieve a better cost-performance trade-off than a Spark cluster of static size built upon large EC2 virtual machines. To support this conclusion, we evaluate three irregular algorithms: Unbalanced Tree Search (UTS), Mandelbrot Set using the Mariani-Silver algorithm and Betweenness Centrality (BC) on a random graph. For instance, our serverless implementation of UTS is able to outperform Spark by up to 55% with the same cost. We also show that a serverless environment can outperform a large EC2 in the BC algorithm by a 10% using the same amount of virtual CPUs. This provides the first concrete evidence that highly-parallel, irregular workloads can be efficiently executed using purely stateless functions with almost zero burden on users i.e., no need for users to understand non-obvious system-level parameters and optimizations. Furthermore, we show that UTS can benefit from the FaaS pay-as-you-go billing model, which makes it worth for the first time to enable certain application-level optimizations that can lead to significant improvements (e.g. of 41%) with negligible increase in cost.
△ Less
Submitted 30 June, 2022;
originally announced June 2022.
-
Transparent Serverless execution of Python multiprocessing applications
Authors:
Aitor Arjona,
Gerard Finol,
Pedro Garcia-Lopez
Abstract:
Access transparency means that both local and remote resources are accessed using identical operations. With transparency, unmodified single-machine applications could run over disaggregated compute, storage, and memory resources. Hiding the complexity of distributed systems through transparency would have great benefits, like scaling-out local-parallel scientific applications over flexible disagg…
▽ More
Access transparency means that both local and remote resources are accessed using identical operations. With transparency, unmodified single-machine applications could run over disaggregated compute, storage, and memory resources. Hiding the complexity of distributed systems through transparency would have great benefits, like scaling-out local-parallel scientific applications over flexible disaggregated resources in the Cloud.
This paper presents a performance evaluation where we assess the feasibility of access transparency over state-of-the-art Cloud disaggregated resources for Python multiprocessing applications. We have interfaced the multiprocessing module with an implementation that transparently runs processes on serverless functions and uses an in-memory data store for shared state.
To evaluate transparency, we run in the Cloud four unmodified applications: Uber Research's Evolution Strategies, Baselines-AI's Proximal Policy Optimization, Pandaral.lel's dataframe, and ScikitLearn's Hyperparameter tuning. We compare execution time and scalability of the same application running over disaggregated resources using our library, with the single-machine Python multiprocessing libraries in a large VM. For equal resources, applications efficiently using message-passing abstractions achieve comparable results despite the significant overheads of remote communication. Other shared-memory intensive applications do not perform due to high remote memory latency.
The results show that Python's multiprocessing library design is an enabler towards transparency: legacy applications using efficient disaggregated abstractions can transparently scale beyond VM limited resources for increased parallelism without changing the underlying code or architecture.
△ Less
Submitted 22 November, 2022; v1 submitted 18 May, 2022;
originally announced May 2022.
-
Triggerflow: Trigger-based Orchestration of Serverless Workflows
Authors:
Aitor Arjona,
Pedro García-López,
Josep Sampé,
Aleksander Slominski,
Lionel Villard
Abstract:
As more applications are being moved to the Cloud thanks to serverless computing, it is increasingly necessary to support the native life cycle execution of those applications in the data center. But existing cloud orchestration systems either focus on short-running workflows (like IBM Composer or Amazon Step Functions Express Workflows) or impose considerable overheads for synchronizing massively…
▽ More
As more applications are being moved to the Cloud thanks to serverless computing, it is increasingly necessary to support the native life cycle execution of those applications in the data center. But existing cloud orchestration systems either focus on short-running workflows (like IBM Composer or Amazon Step Functions Express Workflows) or impose considerable overheads for synchronizing massively parallel jobs (Azure Durable Functions, Amazon Step Functions). None of them are open systems enabling extensible interception and optimization of custom workflows. We present Triggerflow: an extensible Trigger-based Orchestration architecture for serverless workflows. We demonstrate that Triggerflow is a novel serverless building block capable of constructing different reactive orchestrators (State Machines, Directed Acyclic Graphs, Workflow as code, Federated Learning orchestrator). We also validate that it can support high-volume event processing workloads, auto-scale on demand with scale down to zero when not used, and transparently guarantee fault tolerance and efficient resource usage when orchestrating long running scientific workflows.
△ Less
Submitted 22 June, 2021; v1 submitted 1 June, 2021;
originally announced June 2021.
-
Benchmarking Parallelism in FaaS Platforms
Authors:
Daniel Barcelona-Pons,
Pedro García-López
Abstract:
Serverless computing has seen a myriad of work exploring its potential. Some systems tackle Function-as-a-Service (FaaS) properties on automatic elasticity and scale to run highly-parallel computing jobs. However, they focus on specific platforms and convey that their ideas can be extrapolated to any FaaS runtime.
An important question arises: do all FaaS platforms fit parallel computations? In…
▽ More
Serverless computing has seen a myriad of work exploring its potential. Some systems tackle Function-as-a-Service (FaaS) properties on automatic elasticity and scale to run highly-parallel computing jobs. However, they focus on specific platforms and convey that their ideas can be extrapolated to any FaaS runtime.
An important question arises: do all FaaS platforms fit parallel computations? In this paper, we argue that not all of them provide the necessary means to host highly-parallel applications. To validate our hypothesis, we create a comparative framework and categorize the architectures of four cloud FaaS offerings, emphasizing parallel performance. We attest and extend this description with an empirical experiment that consists in plotting in deep detail the evolution of a parallel computing job on each service.
The analysis of our results evinces that FaaS is not inherently good for parallel computations and architectural differences across platforms are decisive to categorize their performance. A key insight is the importance of virtualization technologies and the scheduling approach of FaaS platforms. Parallelism improves with lighter virtualization and proactive scheduling due to finer resource allocation and faster elasticity. This causes some platforms like AWS and IBM to perform well for highly-parallel computations, while others such as Azure present difficulties to achieve the required parallelism degree. Consequently, the information in this paper becomes of special interest to help users choose the most adequate infrastructure for their parallel applications.
△ Less
Submitted 1 June, 2021; v1 submitted 28 October, 2020;
originally announced October 2020.
-
Triggerflow: Trigger-based Orchestration of Serverless Workflows
Authors:
Pedro García-López,
Aitor Arjona,
Josep Sampe,
Aleksander Slominski,
Lionel Villard
Abstract:
As more applications are being moved to the Cloud thanks to serverless computing, it is increasingly necessary to support native life cycle execution of those applications in the data center. But existing systems either focus on short-running workflows (like IBM Composer or Amazon Express Workflows) or impose considerable overheads for synchronizing massively parallel jobs (Azure Durable Functions…
▽ More
As more applications are being moved to the Cloud thanks to serverless computing, it is increasingly necessary to support native life cycle execution of those applications in the data center. But existing systems either focus on short-running workflows (like IBM Composer or Amazon Express Workflows) or impose considerable overheads for synchronizing massively parallel jobs (Azure Durable Functions, Amazon Step Functions, Google Cloud Composer). None of them are open systems enabling extensible interception and optimization of custom workflows. We present Triggerflow: an extensible Trigger-based Orchestration architecture for serverless workflows built on top of Knative Eventing and Kubernetes technologies. We demonstrate that Triggerflow is a novel serverless building block capable of constructing different reactive schedulers (State Machines, Directed Acyclic Graphs, Workflow as code). We also validate that it can support high-volume event processing workloads, auto-scale on demand and transparently optimize scientific workflows.
△ Less
Submitted 17 June, 2020; v1 submitted 15 June, 2020;
originally announced June 2020.
-
Serverless End Game: Disaggregation enabling Transparency
Authors:
Pedro García-López,
Aleksander Slominski,
Simon Shillaker,
Michael Behrendt,
Barnard Metzler
Abstract:
For many years, the distributed systems community has struggled to smooth the transition from local to remote computing. Transparency means concealing the complexities of distributed programming like remote locations, failures or scaling. For us, full transparency implies that we can compile, debug and run unmodified single-machine code over effectively unlimited compute, storage, and memory resou…
▽ More
For many years, the distributed systems community has struggled to smooth the transition from local to remote computing. Transparency means concealing the complexities of distributed programming like remote locations, failures or scaling. For us, full transparency implies that we can compile, debug and run unmodified single-machine code over effectively unlimited compute, storage, and memory resources. We elaborate in this article why resource disaggregation in serverless computing is the definitive catalyst to enable full transparency in the Cloud. We demonstrate with two experiments that we can achieve transparency today over disaggregated serverless resources and obtain comparable performance to local executions. We also show that locality cannot be neglected for many problems and we present five open research challenges: granular middleware and locality, memory disaggregation, virtualization, elastic programming models, and optimized deployment. If full transparency is possible, who needs explicit use of middleware if you can treat remote entities as local ones? Can we close the curtains of distributed systems complexity for the majority of users?
△ Less
Submitted 1 June, 2020;
originally announced June 2020.
-
ServerMix: Tradeoffs and Challenges of Serverless Data Analytics
Authors:
Pedro García-López,
Marc Sánchez-Artigas,
Simon Shillaker,
Peter Pietzuch,
David Breitgand,
Gil Vernik,
Pierre Sutra,
Tristan Tarrant,
Ana Juan Ferrer
Abstract:
Serverless computing has become very popular today since it largely simplifies cloud programming. Developers do not need to longer worry about provisioning or operating servers, and they pay only for the compute resources used when their code is run. This new cloud paradigm suits well for many applications, and researchers have already begun investigating the feasibility of serverless computing fo…
▽ More
Serverless computing has become very popular today since it largely simplifies cloud programming. Developers do not need to longer worry about provisioning or operating servers, and they pay only for the compute resources used when their code is run. This new cloud paradigm suits well for many applications, and researchers have already begun investigating the feasibility of serverless computing for data analytics. Unfortunately, today's serverless computing presents important limitations that make it really difficult to support all sorts of analytics workloads. This paper first starts by analyzing three fundamental trade-offs of today's serverless computing model and their relationship with data analytics. It studies how by relaxing disaggregation, isolation, and simple scheduling, it is possible to increase the overall computing performance, but at the expense of essential aspects of the model such as elasticity, security, or sub-second activations, respectively. The consequence of these trade-offs is that analytics applications may well end up embracing hybrid systems composed of serverless and serverful components, which we call Servermix in this paper. We will review the existing related work to show that most applications can be actually categorized as Servermix. Finally, this paper will introduce the major challenges of the CloudButton research project to manage these trade-offs.
△ Less
Submitted 26 July, 2019;
originally announced July 2019.