-
Multi-FedLS: a Framework for Cross-Silo Federated Learning Applications on Multi-Cloud Environments
Authors:
Rafaela C. Brum,
Maria Clicia Stelling de Castro,
Luciana Arantes,
Lúcia Maria de A. Drummond,
Pierre Sens
Abstract:
Federated Learning (FL) is a distributed Machine Learning (ML) technique that can benefit from cloud environments while preserving data privacy. We propose Multi-FedLS, a framework that manages multi-cloud resources, reducing execution time and financial costs of Cross-Silo Federated Learning applications by using preemptible VMs, cheaper than on-demand ones but that can be revoked at any time. Ou…
▽ More
Federated Learning (FL) is a distributed Machine Learning (ML) technique that can benefit from cloud environments while preserving data privacy. We propose Multi-FedLS, a framework that manages multi-cloud resources, reducing execution time and financial costs of Cross-Silo Federated Learning applications by using preemptible VMs, cheaper than on-demand ones but that can be revoked at any time. Our framework encloses four modules: Pre-Scheduling, Initial Mapping, Fault Tolerance, and Dynamic Scheduler. This paper extends our previous work \cite{brum2022sbac} by formally describing the Multi-FedLS resource manager framework and its modules. Experiments were conducted with three Cross-Silo FL applications on CloudLab and a proof-of-concept confirms that Multi-FedLS can be executed on a multi-cloud composed by AWS and GCP, two commercial cloud providers. Results show that the problem of executing Cross-Silo FL applications in multi-cloud environments with preemptible VMs can be efficiently resolved using a mathematical formulation, fault tolerance techniques, and a simple heuristic to choose a new VM in case of revocation.
△ Less
Submitted 17 August, 2023;
originally announced August 2023.
-
Scheduling Bag-of-Tasks in Clouds using Spot and Burstable Virtual Machines
Authors:
Luan Teylo,
Luciana Arantes,
Pierre Sens,
Lúcia Maria de A. Drummond
Abstract:
Leading Cloud providers offer several types of Virtual Machines (VMs) in diverse contract models, with different guarantees in terms of availability and reliability. Among them, the most popular contract models are the on-demand and the spot models. In the former, on-demand VMs are allocated for a fixed cost per time unit, and their availability is ensured during the whole execution. On the other…
▽ More
Leading Cloud providers offer several types of Virtual Machines (VMs) in diverse contract models, with different guarantees in terms of availability and reliability. Among them, the most popular contract models are the on-demand and the spot models. In the former, on-demand VMs are allocated for a fixed cost per time unit, and their availability is ensured during the whole execution. On the other hand, in the spot market, VMs are offered with a huge discount when compared to the on-demand VMs, but their availability fluctuates according to the cloud's current demand that can terminate or hibernate a spot VM at any time. Furthermore, in order to cope with workload variations, cloud providers have also introduced the concept of burstable VMs which are able to burst up their respective baseline CPU performance during a limited period of time with an up to 20% discount when compared to an equivalent non-burstable on-demand VMs. In the current work, we present the Burst Hibernation-Aware Dynamic Scheduler (Burst-HADS), a framework that schedules and executes tasks of Bag-of-Tasks applications with deadline constraints by exploiting spot and on-demand burstable VMs, aiming at minimizing both the monetary cost and the execution time. Based on ILS metaheuristics, Burst-HADS defines an initial scheduling map of tasks to VMs which can then be dynamically altered by migrating tasks of a hibernated spot VM or by performing work-stealing when VMs become idle. Performance results on Amazon EC2 cloud with different applications show that, when compared to a solution that uses only regular on-demand instances, Burst-HADS reduces the monetary cost of the execution and meet the application deadline even in scenarios with high spot hibernation rates. It also reduces the total execution time when compared to a solution that uses only spot and non-burstable on-demand instances.
△ Less
Submitted 10 November, 2020;
originally announced November 2020.
-
A Bag-of-Tasks Scheduler Tolerant to Temporal Failures in Clouds
Authors:
Luan Teylo,
Lúcia Maria de A. Drummond,
Luciana Arantes,
Pierre Sens
Abstract:
Cloud platforms have emerged as a prominent environment to execute high performance computing (HPC) applications providing on-demand resources as well as scalability. They usually offer different classes of Virtual Machines (VMs) which ensure different guarantees in terms of availability and volatility, provisioning the same resource through multiple pricing models. For instance, in Amazon EC2 clo…
▽ More
Cloud platforms have emerged as a prominent environment to execute high performance computing (HPC) applications providing on-demand resources as well as scalability. They usually offer different classes of Virtual Machines (VMs) which ensure different guarantees in terms of availability and volatility, provisioning the same resource through multiple pricing models. For instance, in Amazon EC2 cloud, the user pays per hour for on-demand VMs while spot VMs are unused instances available for lower price. Despite the monetary advantages, a spot VM can be terminated, stopped, or hibernated by EC2 at any moment.
Using both hibernation-prone spot VMs (for cost sake) and on-demand VMs, we propose in this paper a static scheduling for HPC applications which are composed by independent tasks (bag-of-task) with deadline constraints. However, if a spot VM hibernates and it does not resume within a time which guarantees the application's deadline, a temporal failure takes place. Our scheduling, thus, aims at minimizing monetary costs of bag-of-tasks applications in EC2 cloud, respecting its deadline and avoiding temporal failures. To this end, our algorithm statically creates two scheduling maps: (i) the first one contains, for each task, its starting time and on which VM (i.e., an available spot or on-demand VM with the current lowest price) the task should execute; (ii) the second one contains, for each task allocated on a VM spot in the first map, its starting time and on which on-demand VM it should be executed to meet the application deadline in order to avoid temporal failures. The latter will be used whenever the hibernation period of a spot VM exceeds a time limit.
Performance results from simulation with task execution traces, configuration of Amazon EC2 VM classes, and VMs market history confirms the effectiveness of our scheduling and that it tolerates temporal failures.
△ Less
Submitted 24 October, 2018;
originally announced October 2018.
-
A barrier-type method for multiobjective optimization
Authors:
Ellen H. Fukuda,
L. M. Grana Drummond,
Fernanda M. P. Raupp
Abstract:
For solving constrained multicriteria problems, we introduce the multiobjective barrier method (MBM), which extends the scalar-valued internal penalty method. This multiobjective version of the classical method also requires a penalty barrier for the feasible set and a sequence of nonnegative penalty parameters. Differently from the single-valued procedure, MBM is implemented by means of an auxili…
▽ More
For solving constrained multicriteria problems, we introduce the multiobjective barrier method (MBM), which extends the scalar-valued internal penalty method. This multiobjective version of the classical method also requires a penalty barrier for the feasible set and a sequence of nonnegative penalty parameters. Differently from the single-valued procedure, MBM is implemented by means of an auxiliary "monotonic" real-valued mapping, which may be chosen in a quite large set of functions. Here, we consider problems with continuous objective functions, where the feasible sets are defined by finitely many continuous inequalities. Under mild assumptions, and depending on the monotonicity type of the auxiliary function, we establish convergence to Pareto or weak Pareto optima. Finally, we also propose an implementable version of MBM for seeking local optima and analyze its convergence to Pareto or weak Pareto solutions.
△ Less
Submitted 30 March, 2018;
originally announced March 2018.
-
A Quantitative Model for Predicting Cross-application Interference in Virtual Environments
Authors:
Maicon Melo Alves,
Lúcia Maria de Assumpção Drummond
Abstract:
Cross-application interference can affect drastically performance of HPC applications when running in clouds. This problem is caused by concurrent access performed by co-located applications to shared and non-sliceable resources such as cache and memory. In order to address this issue, some works adopted a qualitative approach that does not take into account the amount of access to shared resource…
▽ More
Cross-application interference can affect drastically performance of HPC applications when running in clouds. This problem is caused by concurrent access performed by co-located applications to shared and non-sliceable resources such as cache and memory. In order to address this issue, some works adopted a qualitative approach that does not take into account the amount of access to shared resources. In addition, a few works, even considering the amount of access, evaluated just the SLLC access contention as the root of this problem. However, our experiments revealed that interference is intrinsically related to the amount of simultaneous access to shared resources, besides showing that another shared resources, apart from SLLC, can also influence the interference suffered by co-located applications. In this paper, we present a quantitative model for predicting cross-application interference in virtual environments. Our proposed model takes into account the amount of simultaneous access to SLLC, DRAM and virtual network, and the similarity of application's access burden to predict the level of interference suffered by applications when co-located in a same physical machine. Experiments considering a real petroleum reservoir simulator and applications from HPCC benchmark showed that our model reached an average and maximum prediction errors around 4\% and 12\%, besides achieving an error less than 10\% in approximately 96\% of all tested cases.
△ Less
Submitted 13 October, 2016;
originally announced October 2016.
-
Solving the Quadratic Assignment Problem on heterogeneous environment (CPUs and GPUs) with the application of Level 2 Reformulation and Linearization Technique
Authors:
Alexandre Domingues Gonçalves,
Artur Alves Pessoa,
Lúcia Maria de Assumpção Drummond,
Cristiana Bentes,
Ricardo Farias
Abstract:
The Quadratic Assignment Problem, QAP, is a classic combinatorial optimization problem, classified as NP-hard and widely studied. This problem consists in assigning N facilities to N locations obeying the relation of 1 to 1, aiming to minimize costs of the displacement between the facilities. The application of Reformulation and Linearization Technique, RLT, to the QAP leads to a tight linear rela…
▽ More
The Quadratic Assignment Problem, QAP, is a classic combinatorial optimization problem, classified as NP-hard and widely studied. This problem consists in assigning N facilities to N locations obeying the relation of 1 to 1, aiming to minimize costs of the displacement between the facilities. The application of Reformulation and Linearization Technique, RLT, to the QAP leads to a tight linear relaxation but large and difficult to solve. Previous works based on level 3 RLT needed about 700GB of working memory to process one large instances (N = 30 facilities). We present a modified version of the algorithm proposed by Adams et al. which executes on heterogeneous systems (CPUs and GPUs), based on level 2 RLT. For some instances, our algorithm is up to 140 times faster and occupy 97% less memory than the level 3 RLT version. The proposed algorithm was able to solve by first time two instances: tai35b and tai40b.
△ Less
Submitted 7 October, 2015;
originally announced October 2015.
-
Handling Flash-Crowd Events to Improve the Performance of Web Applications
Authors:
Ubiratam de Paula Junior,
Lúcia M. A. Drummond,
Daniel de Oliveira,
Yuri Frota,
Valmir C. Barbosa
Abstract:
Cloud computing can offer a set of computing resources according to users' demand. It is suitable to be used to handle flash-crowd events in Web applications due to its elasticity and on-demand characteristics. Thus, when Web applications need more computing or storage capacity, they just instantiate new resources. However, providers have to estimate the amount of resources to instantiate to handl…
▽ More
Cloud computing can offer a set of computing resources according to users' demand. It is suitable to be used to handle flash-crowd events in Web applications due to its elasticity and on-demand characteristics. Thus, when Web applications need more computing or storage capacity, they just instantiate new resources. However, providers have to estimate the amount of resources to instantiate to handle with the flash-crowd event. This estimation is far from trivial since each cloud environment provides several kinds of heterogeneous resources, each one with its own characteristics such as bandwidth, CPU, memory and financial cost. In this paper, the Flash Crowd Handling Problem (FCHP) is precisely defined and formulated as an integer programming problem. A new algorithm for handling with a flash crowd named FCHP-ILS is also proposed. With FCHP-ILS the Web applications can replicate contents in the already instantiated resources and define the types and amount of resources to instantiate in the cloud during a flash crowd. Our approach is evaluated considering real flash crowd traces obtained from the related literature. We also present a case study, based on a synthetic dataset representing flash-crowd events in small scenarios aiming at the comparison of the proposed approach against Amazon's Auto-Scale mechanism.
△ Less
Submitted 10 October, 2014;
originally announced October 2014.
-
Improving Lower Bounds for the Quadratic Assignment Problem by applying a Distributed Dual Ascent Algorithm
Authors:
Alexandre Domingues Goncalves,
Lucia Maria Drummond,
Artur Alves Pessoa,
Peter Hahn
Abstract:
The application of the Reformulation Linearization Technique (RLT) to the Quadratic Assignment Problem (QAP) leads to a tight linear relaxation with huge dimensions that is hard to solve. Previous works found in the literature show that these relaxations combined with branch-and-bound algorithms belong to the state-of-the-art of exact methods for the QAP. For the level 3 RLT (RLT3), using this rel…
▽ More
The application of the Reformulation Linearization Technique (RLT) to the Quadratic Assignment Problem (QAP) leads to a tight linear relaxation with huge dimensions that is hard to solve. Previous works found in the literature show that these relaxations combined with branch-and-bound algorithms belong to the state-of-the-art of exact methods for the QAP. For the level 3 RLT (RLT3), using this relaxation is prohibitive in conventional machines for instances with more than 22 locations due to memory limitations. This paper presents a distributed version of a dual ascent algorithm for the RLT3 QAP relaxation that approximately solves it for instances with up to 30 locations for the first time. Although, basically, the distributed algorithm has been implemented on top of its sequential conterpart, some changes, which improved not only the parallel performance but also the quality of solutions, were proposed here. When compared to other lower bounding methods found in the literature, our algorithm generates the best known lower bounds for 26 out of the 28 tested instances, reaching the optimal solution in 18 of them.
△ Less
Submitted 31 March, 2013;
originally announced April 2013.
-
Memory Aware Load Balance Strategy on a Parallel Branch-and-Bound Application
Authors:
Juliana M. N. Silva,
Cristina Boeres,
Lúcia M. A. Drummond,
Artur A. Pessoa
Abstract:
The latest trends in high-performance computing systems show an increasing demand on the use of a large scale multicore systems in a efficient way, so that high compute-intensive applications can be executed reasonably well. However, the exploitation of the degree of parallelism available at each multicore component can be limited by the poor utilization of the memory hierarchy available. Actually…
▽ More
The latest trends in high-performance computing systems show an increasing demand on the use of a large scale multicore systems in a efficient way, so that high compute-intensive applications can be executed reasonably well. However, the exploitation of the degree of parallelism available at each multicore component can be limited by the poor utilization of the memory hierarchy available. Actually, the multicore architecture introduces some distinct features that are already observed in shared memory and distributed environments. One example is that subsets of cores can share different subsets of memory. In order to achieve high performance it is imperative that a careful allocation scheme of an application is carried out on the available cores, based on a scheduling model that considers the main performance bottlenecks, as for example, memory contention. In this paper, the {\em Multicore Cluster Model} (MCM) is proposed, which captures the most relevant performance characteristics in multicores systems such as the influence of memory hierarchy and contention. Better performance was achieved when a load balance strategy for a Branch-and-Bound application applied to the Partitioning Sets Problem is based on MCM, showing its efficiency and applicability to modern systems.
△ Less
Submitted 22 February, 2013;
originally announced February 2013.
-
A Distributed Transportation Simplex Applied to a Content Distribution Network Problem
Authors:
Rafaelli de C. Coutinho,
Lúcia M. A. Drummond,
Yuri Frota
Abstract:
A Content Distribution Network (CDN) can be defined as an overlay system that replicates copies of contents at multiple points of a network, close to the final users, with the objective of improving data access. CDN technology is widely used for the distribution of large-sized contents, like in video streaming. In this paper we address the problem of finding the best server for each customer reque…
▽ More
A Content Distribution Network (CDN) can be defined as an overlay system that replicates copies of contents at multiple points of a network, close to the final users, with the objective of improving data access. CDN technology is widely used for the distribution of large-sized contents, like in video streaming. In this paper we address the problem of finding the best server for each customer request in CDNs, in order to minimize the overall cost. We consider the problem as a transportation problem and a distributed algorithm is proposed to solve it. The algorithm is composed of two independent phases: a distributed heuristic finds an initial solution that may be later improved by a distributed transportation simplex algorithm. It is compared with the sequential version of the transportation simplex and with an auction-based distributed algorithm. Computational experiments carried out on a set of instances adapted from the literature revealed that our distributed approach has a performance similar to its sequential counterpart, in spite of not requiring global information about the contents requests. Moreover, the results also showed that the new method outperforms the based-auction distributed algorithm.
△ Less
Submitted 23 October, 2012;
originally announced October 2012.
-
On reducing the complexity of matrix clocks
Authors:
L. M. A. Drummond,
V. C. Barbosa
Abstract:
Matrix clocks are a generalization of the notion of vector clocks that allows the local representation of causal precedence to reach into an asynchronous distributed computation's past with depth $x$, where $x\ge 1$ is an integer. Maintaining matrix clocks correctly in a system of $n$ nodes requires that everymessage be accompanied by $O(n^x)$ numbers, which reflects an exponential dependency of…
▽ More
Matrix clocks are a generalization of the notion of vector clocks that allows the local representation of causal precedence to reach into an asynchronous distributed computation's past with depth $x$, where $x\ge 1$ is an integer. Maintaining matrix clocks correctly in a system of $n$ nodes requires that everymessage be accompanied by $O(n^x)$ numbers, which reflects an exponential dependency of the complexity of matrix clocks upon the desired depth $x$. We introduce a novel type of matrix clock, one that requires only $nx$ numbers to be attached to each message while maintaining what for many applications may be the most significant portion of the information that the original matrix clock carries. In order to illustrate the new clock's applicability, we demonstrate its use in the monitoring of certain resource-sharing computations.
△ Less
Submitted 23 September, 2003;
originally announced September 2003.