-
Performance of Load Balancers with Bounded Maximum Queue Length in case of Non-Exponential Job Sizes
Authors:
Tim Hellemans,
Grzegorz Kielanski,
Benny Van Houdt
Abstract:
In large-scale distributed systems, balancing the load in an efficient way is crucial in order to achieve low latency. Recently, some load balancing policies have been suggested which are able to achieve a bounded maximum queue length in the large-scale limit. However, these policies have thus far only been studied in case of exponential job sizes. As job sizes are more variable in real systems, w…
▽ More
In large-scale distributed systems, balancing the load in an efficient way is crucial in order to achieve low latency. Recently, some load balancing policies have been suggested which are able to achieve a bounded maximum queue length in the large-scale limit. However, these policies have thus far only been studied in case of exponential job sizes. As job sizes are more variable in real systems, we investigate how the performance of these policies (and in particular the value of these bounds) is impacted by the job size distribution.
We present a unified analysis which can be used to compute the bound on the queue length in case of phase-type distributed job sizes for four load balancing policies. We find that in most cases, the bound on the maximum queue length can be expressed in closed form. In addition, we obtain job size (in)dependent bounds on the expected response time.
Our methodology relies on the use of the cavity process. That is, we conjecture that the cavity process captures the behaviour of the real system as the system size grows large. For each policy, we illustrate the accuracy of the cavity process by means of simulation.
△ Less
Submitted 11 January, 2022;
originally announced January 2022.
-
Download time analysis for distributed storage systems with node failures
Authors:
Tim Hellemans,
Arti Yardi,
Tejas Bodas
Abstract:
We consider a distributed storage system which stores several hot (popular) and cold (less popular) data files across multiple nodes or servers. Hot files are stored using repetition codes while cold files are stored using erasure codes. The nodes are prone to failure and hence at any given time, we assume that only a fraction of the nodes are available. Using a cavity process based mean field fra…
▽ More
We consider a distributed storage system which stores several hot (popular) and cold (less popular) data files across multiple nodes or servers. Hot files are stored using repetition codes while cold files are stored using erasure codes. The nodes are prone to failure and hence at any given time, we assume that only a fraction of the nodes are available. Using a cavity process based mean field framework, we analyze the download time for users accessing hot or cold data in the presence of failed nodes. Our work also illustrates the impact of the choice of the storage code on the download time performance of users in the system.
△ Less
Submitted 6 May, 2021;
originally announced May 2021.
-
Improved Load Balancing in Large Scale Systems using Attained Service Time Reporting
Authors:
Tim Hellemans,
Benny Van Houdt
Abstract:
Our interest lies in load balancing jobs in large scale systems consisting of multiple dispatchers and FCFS servers. In the absence of any information on job sizes, dispatchers typically use queue length information reported by the servers to assign incoming jobs. When job sizes are highly variable, using only queue length information is clearly suboptimal and performance can be improved if some i…
▽ More
Our interest lies in load balancing jobs in large scale systems consisting of multiple dispatchers and FCFS servers. In the absence of any information on job sizes, dispatchers typically use queue length information reported by the servers to assign incoming jobs. When job sizes are highly variable, using only queue length information is clearly suboptimal and performance can be improved if some indication can be provided to the dispatcher about the size of an ongoing job. In a FCFS server measuring the attained service time of the ongoing job is easy and servers can therefore report this attained service time together with the queue length when queried by a dispatcher.
In this paper we propose and analyse a variety of load balancing policies that exploit both the queue length and attained service time to assign jobs, as well as policies for which only the attained service time of the job in service is used. We present a unified analysis for all these policies in a large scale system under the usual asymptotic independence assumptions. The accuracy of the proposed analysis is illustrated using simulation.
We present extensive numerical experiments which clearly indicate that a significant improvement in waiting (and thus also in response) time may be achieved by using the attained service time information on top of the queue length of a server. Moreover, the policies which do not make use of the queue length still provide an improved waiting time for moderately loaded systems.
△ Less
Submitted 15 April, 2021; v1 submitted 16 November, 2020;
originally announced November 2020.
-
Mean Waiting Time in Large-Scale and Critically Loaded Power of d Load Balancing Systems
Authors:
Tim Hellemans,
Benny Van Houdt
Abstract:
Mean field models are a popular tool used to analyse load balancing policies. In some cases the waiting time distribution of the mean field limit has an explicit form. In other cases it can be computed as the solution of a set of differential equations. Here we study the limit of the mean waiting time $E[W_λ]$ as the arrival rate $λ$ approaches $1$ for a number of load balancing policies when job…
▽ More
Mean field models are a popular tool used to analyse load balancing policies. In some cases the waiting time distribution of the mean field limit has an explicit form. In other cases it can be computed as the solution of a set of differential equations. Here we study the limit of the mean waiting time $E[W_λ]$ as the arrival rate $λ$ approaches $1$ for a number of load balancing policies when job sizes are exponential with mean $1$ (i.e. the system gets close to instability). As $E[W_λ]$ diverges to infinity, we scale with $-\log(1-λ)$ and present a method to compute the limit $\lim_{λ\rightarrow 1^-}-E[W_λ]/\log(1-λ)$. This limit has a surprisingly simple form for the load balancing algorithms considered. We present a general result that holds for any policy for which the associated differential equation satisfies a list of assumptions. For the LL(d) policy which assigns an incoming job to a server with the least work left among d randomly selected servers these assumptions are trivially verified. For this policy we prove the limit is given by $\frac{1}{d-1}$. We further show that the LL(d,K) policy, which assigns batches of $K$ jobs to the $K$ least loaded servers among d randomly selected servers, satisfies the assumptions and the limit is equal to $\frac{K}{d-K}$. For a policy which applies LL($d_i$) with probability $p_i$, we show that the limit is given by $\frac{1}{\sum_ip_id_i-1}$. We further indicate that our main result can also be used for load balancers with redundancy or memory. In addition, we propose an alternate scaling $-\log(p_λ)$ instead of $-\log(1-λ)$, for which the limit $\lim_{λ\rightarrow 0^+}-E[W_λ]/\log(p_λ)$ is well defined and non-zero (contrary to $\lim_{λ\rightarrow 0^+}-E[W_λ]/\log(1-λ)$), while $\lim_{λ\rightarrow 1^-}\log(1-λ) / \log(p_λ)=1$.
△ Less
Submitted 28 January, 2021; v1 submitted 2 April, 2020;
originally announced April 2020.
-
Performance Analysis of Load Balancing Policies with Memory
Authors:
Tim Hellemans,
Benny Van Houdt
Abstract:
Joining the shortest or least loaded queue among $d$ randomly selected queues are two fundamental load balancing policies. Under both policies the dispatcher does not maintain any information on the queue length or load of the servers. In this paper we analyze the performance of these policies when the dispatcher has some memory available to store the ids of some of the idle servers. We consider m…
▽ More
Joining the shortest or least loaded queue among $d$ randomly selected queues are two fundamental load balancing policies. Under both policies the dispatcher does not maintain any information on the queue length or load of the servers. In this paper we analyze the performance of these policies when the dispatcher has some memory available to store the ids of some of the idle servers. We consider methods where the dispatcher discovers idle servers as well as methods where idle servers inform the dispatcher about their state.
We focus on large-scale systems and our analysis uses the cavity method. The main insight provided is that the performance measures obtained via the cavity method for a load balancing policy {\it with} memory reduce to the performance measures for the same policy {\it without} memory provided that the arrival rate is properly scaled. Thus, we can study the performance of load balancers with memory in the same manner as load balancers without memory. In particular this entails closed form solutions for joining the shortest or least loaded queue among $d$ randomly selected queues with memory in case of exponential job sizes. Moreover, we obtain a simple closed form expression for the (scaled) expected waiting time as the system tends towards instability.
We present simulation results that support our belief that the approximation obtained by the cavity method becomes exact as the number of servers tends to infinity.
△ Less
Submitted 22 January, 2021; v1 submitted 17 February, 2020;
originally announced February 2020.
-
On the Power-of-d-choices with Least Loaded Server Selection
Authors:
Tim Hellemans,
Benny Van Houdt
Abstract:
Motivated by distributed schedulers that combine the power-of-d-choices with late binding and systems that use replication with cancellation-on-start, we study the performance of the LL(d) policy which assigns a job to a server that currently has the least workload among d randomly selected servers in large-scale homogeneous clusters. We consider general service time distributions and propose a pa…
▽ More
Motivated by distributed schedulers that combine the power-of-d-choices with late binding and systems that use replication with cancellation-on-start, we study the performance of the LL(d) policy which assigns a job to a server that currently has the least workload among d randomly selected servers in large-scale homogeneous clusters. We consider general service time distributions and propose a partial integro-differential equation to describe the evolution of the system. This equation relies on the earlier proven ansatz for LL(d) which asserts that the workload distribution of any finite set of queues becomes independent of one another as the number of servers tends to infinity. Based on this equation we propose a fixed point iteration for the limiting workload distribution and study its convergence. For exponential job sizes we present a simple closed form expression for the limiting workload distribution that is valid for any work-conserving service discipline as well as for the limiting response time distribution in case of first-come-first-served scheduling. We further show that for phase-type distributed job sizes the limiting workload and response time distribution can be expressed via the unique solution of a simple set of ordinary differential equations. Numerical and analytical results that compare response time of the classic power-of-d-choices algorithm and the LL(d) policy are also presented and the accuracy of the limiting response time distribution for finite systems is illustrated using simulation.
△ Less
Submitted 15 February, 2018;
originally announced February 2018.