-
Semi-supervised Community Detection using Glauber Dynamics for an Ising Model
Authors:
Konstantin Avrachenkov,
Diego Goldsztajn
Abstract:
We consider graphs with two communities and analyze an algorithm for learning the community labels when the edges of the graph and only a small fraction of the labels are known in advance. The algorithm is based on the Glauber dynamics for an Ising model where the energy function includes a quadratic penalty on the magnetization. The analysis focuses on graphs sampled from a Stochastic Block Model…
▽ More
We consider graphs with two communities and analyze an algorithm for learning the community labels when the edges of the graph and only a small fraction of the labels are known in advance. The algorithm is based on the Glauber dynamics for an Ising model where the energy function includes a quadratic penalty on the magnetization. The analysis focuses on graphs sampled from a Stochastic Block Model (SBM) with slowly growing mean degree. We derive a mean-field limit for the magnetization of each community, which can be used to choose the run-time of the algorithm to obtain a target accuracy level. We further prove that almost exact recovery is achieved in a number of iterations that is quasi-linear in the number of nodes. As a special case, our results provide the first rigorous analysis of the label propagation algorithm in the SBM with slowly diverging mean degree. We complement our theoretical results with several numerical experiments.
△ Less
Submitted 10 June, 2025;
originally announced June 2025.
-
Geometric lower bounds for the steady-state occupancy of processing networks with limited connectivity
Authors:
Diego Goldsztajn,
Andres Ferragut
Abstract:
We consider processing networks where multiple dispatchers are connected to single-server queues by a bipartite compatibility graph, modeling constraints that are common in data centers and cloud networks due to geographic reasons or data locality issues. We prove lower bounds for the steady-state occupancy, i.e., the complementary cumulative distribution function of the empirical queue length dis…
▽ More
We consider processing networks where multiple dispatchers are connected to single-server queues by a bipartite compatibility graph, modeling constraints that are common in data centers and cloud networks due to geographic reasons or data locality issues. We prove lower bounds for the steady-state occupancy, i.e., the complementary cumulative distribution function of the empirical queue length distribution. The lower bounds are geometric with ratios given by two flexibility metrics: the average degree of the dispatchers and a novel metric that averages the minimum degree over the compatible dispatchers across the servers. Using these lower bounds, we establish that the asymptotic performance of a growing processing network cannot match that of the classic Power-of-$d$ or JSQ policies unless the flexibility metrics approach infinity in the large-scale limit.
△ Less
Submitted 13 May, 2025;
originally announced May 2025.
-
Dynamic load balancing for cloud systems under heterogeneous setup delays
Authors:
Fernando Paganini,
Diego Goldsztajn
Abstract:
We consider a distributed cloud service deployed at a set of distinct server pools. Arriving jobs are classified into heterogeneous types, in accordance with their setup times which are differentiated at each of the pools. A dispatcher for each job type controls the balance of load between pools, based on decentralized feedback. The system of rates and queues is modeled by a fluid differential equ…
▽ More
We consider a distributed cloud service deployed at a set of distinct server pools. Arriving jobs are classified into heterogeneous types, in accordance with their setup times which are differentiated at each of the pools. A dispatcher for each job type controls the balance of load between pools, based on decentralized feedback. The system of rates and queues is modeled by a fluid differential equation system, and analyzed via convex optimization. A first, myopic policy is proposed, based on task delay-to-service. Under a simplified dynamic fluid queue model, we prove global convergence to an equilibrium point which minimizes the mean setup time; however queueing delays are incurred with this method. A second proposal is then developed based on proximal optimization, which explicitly models the setup queue and is proved to reach an optimal equilibrium, devoid of queueing delay. Results are demonstrated through a simulation example.
△ Less
Submitted 6 May, 2025;
originally announced May 2025.
-
Asymptotically Optimal Policies for Weakly Coupled Markov Decision Processes
Authors:
Diego Goldsztajn,
Konstantin Avrachenkov
Abstract:
We consider the problem of maximizing the expected average reward obtained over an infinite time horizon by $n$ weakly coupled Markov decision processes. Our setup is a substantial generalization of the multi-armed restless bandit problem that allows for multiple actions and constraints. We establish a connection with a deterministic and continuous-variable control problem where the objective is t…
▽ More
We consider the problem of maximizing the expected average reward obtained over an infinite time horizon by $n$ weakly coupled Markov decision processes. Our setup is a substantial generalization of the multi-armed restless bandit problem that allows for multiple actions and constraints. We establish a connection with a deterministic and continuous-variable control problem where the objective is to maximize the average reward derived from an occupancy measure that represents the empirical distribution of the processes when $n \to \infty$. We show that a solution of this fluid problem can be used to construct policies for the weakly coupled processes that achieve the maximum expected average reward as $n \to \infty$, and we give sufficient conditions for the existence of solutions. Under certain assumptions on the constraints, we prove that these conditions are automatically satisfied if the unconstrained single-process problem admits a suitable unichain and aperiodic policy. In particular, the assumptions include multi-armed restless bandits and a broad class of problems with multiple actions and inequality constraints. Also, the policies can be constructed in an explicit way in these cases. Our theoretical results are complemented by several concrete examples and numerical experiments, which include multichain setups that are covered by the theoretical results.
△ Less
Submitted 6 December, 2024; v1 submitted 7 June, 2024;
originally announced June 2024.
-
Server saturation in skewed networks
Authors:
Diego Goldsztajn,
Sem C. Borst,
Johan S. H. van Leeuwaarden
Abstract:
We consider a model inspired by compatibility constraints that arise between tasks and servers in data centers, cloud computing systems and content delivery networks. The constraints are represented by a bipartite graph or network that interconnects dispatchers with compatible servers. Each dispatcher receives tasks over time and sends every task to a compatible server with the least number of tas…
▽ More
We consider a model inspired by compatibility constraints that arise between tasks and servers in data centers, cloud computing systems and content delivery networks. The constraints are represented by a bipartite graph or network that interconnects dispatchers with compatible servers. Each dispatcher receives tasks over time and sends every task to a compatible server with the least number of tasks, or to a server with the least number of tasks among $d$ compatible servers selected uniformly at random. We focus on networks where the neighborhood of at least one server is skewed in a limiting regime. This means that a diverging number of dispatchers are in the neighborhood which are each compatible with a uniformly bounded number of servers; thus, the degree of the central server approaches infinity while the degrees of many neighboring dispatchers remain bounded. We prove that each server with a skewed neighborhood saturates, in the sense that the mean number of tasks queueing in front of it in steady state approaches infinity. Paradoxically, this pathological behavior can even arise in random networks where nearly all the servers have at most one task in the limit.
△ Less
Submitted 9 April, 2024;
originally announced April 2024.
-
Fluid limits for interacting queues in sparse dynamic graphs
Authors:
Diego Goldsztajn,
Sem C. Borst,
Johan S. H. van Leeuwaarden
Abstract:
Consider a network of $n$ single-server queues where tasks arrive independently at each server at rate $λ_n$. The servers are connected by a graph that is resampled at rate $μ_n$ in a way that is symmetric with respect to the servers, and each task is dispatched to the shortest queue in the graph neighborhood where it appears. We aim to gain insight in the impact of the dynamic network structure o…
▽ More
Consider a network of $n$ single-server queues where tasks arrive independently at each server at rate $λ_n$. The servers are connected by a graph that is resampled at rate $μ_n$ in a way that is symmetric with respect to the servers, and each task is dispatched to the shortest queue in the graph neighborhood where it appears. We aim to gain insight in the impact of the dynamic network structure on the load balancing dynamics in terms of the occupancy process which describes the empirical distribution of the number of tasks across the servers. This process evolves on the underlying dynamic graph, and its dynamics depend on the the number of tasks at each individual server and the neighborhood structure of the graph. We establish that this dependency disappears in the limit as $n \to \infty$ when $λ_n / n \to λ$ and $μ_n \to \infty$, and prove that the limit of the occupancy process is given by a system of differential equations that depends solely on $λ$ and the limiting degree distribution of the graph. We further show that the stationary distribution of the occupancy process converges to an equilibrium of the differential equations, and derive properties of this equilibrium that reflect the impact of the degree distribution. Our focus is on truly sparse graphs where the maximum degree is uniformly bounded across $n$, which is natural in load balancing systems.
△ Less
Submitted 29 May, 2025; v1 submitted 22 May, 2023;
originally announced May 2023.
-
Utility maximizing load balancing policies
Authors:
Diego Goldsztajn,
Sem C. Borst,
Johan S. H. van Leeuwaarden
Abstract:
Consider a service system where incoming tasks are instantaneously dispatched to one out of many heterogeneous server pools. Associated with each server pool is a concave utility function which depends on the class of the server pool and its current occupancy. We derive an upper bound for the mean normalized aggregate utility in stationarity and introduce two load balancing policies that achieve t…
▽ More
Consider a service system where incoming tasks are instantaneously dispatched to one out of many heterogeneous server pools. Associated with each server pool is a concave utility function which depends on the class of the server pool and its current occupancy. We derive an upper bound for the mean normalized aggregate utility in stationarity and introduce two load balancing policies that achieve this upper bound in a large-scale regime. Furthermore, the transient and stationary behavior of these asymptotically optimal load balancing policies is characterized on the scale of the number of server pools, in the same large-scale regime.
△ Less
Submitted 10 February, 2024; v1 submitted 16 December, 2021;
originally announced December 2021.
-
Learning and balancing unknown loads in large-scale systems
Authors:
Diego Goldsztajn,
Sem C. Borst,
Johan S. H. van Leeuwaarden
Abstract:
Consider a system of identical server pools where tasks with exponentially distributed service times arrive as a time-inhomogenenous Poisson process. An admission threshold is used in an inner control loop to assign incoming tasks to server pools while, in an outer control loop, a learning scheme adjusts this threshold over time to keep it aligned with the unknown offered load of the system. In a…
▽ More
Consider a system of identical server pools where tasks with exponentially distributed service times arrive as a time-inhomogenenous Poisson process. An admission threshold is used in an inner control loop to assign incoming tasks to server pools while, in an outer control loop, a learning scheme adjusts this threshold over time to keep it aligned with the unknown offered load of the system. In a many-server regime, we prove that the learning scheme reaches an equilibrium along intervals of time where the normalized offered load per server pool is suitably bounded, and that this results in a balanced distribution of the load. Furthermore, we establish a similar result when tasks with Coxian distributed service times arrive at a constant rate and the threshold is adjusted using only the total number of tasks in the system. The novel proof technique developed in this paper, which differs from a traditional fluid limit analysis, allows to handle rapid variations of the first learning scheme, triggered by excursions of the occupancy process that have vanishing size. Moreover, our approach allows to characterize the asymptotic behavior of the system with Coxian distributed service times without relying on a fluid limit of a detailed state descriptor.
△ Less
Submitted 5 April, 2024; v1 submitted 18 December, 2020;
originally announced December 2020.
-
Self-Learning Threshold-Based Load Balancing
Authors:
Diego Goldsztajn,
Sem C. Borst,
Johan S. H. van Leeuwaarden,
Debankur Mukherjee,
Philip A. Whiting
Abstract:
We consider a large-scale service system where incoming tasks have to be instantaneously dispatched to one out of many parallel server pools. The user-perceived performance degrades with the number of concurrent tasks and the dispatcher aims at maximizing the overall quality-of-service by balancing the load through a simple threshold policy. We demonstrate that such a policy is optimal on the flui…
▽ More
We consider a large-scale service system where incoming tasks have to be instantaneously dispatched to one out of many parallel server pools. The user-perceived performance degrades with the number of concurrent tasks and the dispatcher aims at maximizing the overall quality-of-service by balancing the load through a simple threshold policy. We demonstrate that such a policy is optimal on the fluid and diffusion scales, while only involving a small communication overhead, which is crucial for large-scale deployments. In order to set the threshold optimally, it is important, however, to learn the load of the system, which may be unknown. For that purpose, we design a control rule for tuning the threshold in an online manner. We derive conditions which guarantee that this adaptive threshold settles at the optimal value, along with estimates for the time until this happens. In addition, we provide numerical experiments which support the theoretical results and further indicate that our policy copes effectively with time-varying demand patterns.
△ Less
Submitted 11 September, 2023; v1 submitted 29 October, 2020;
originally announced October 2020.