-
Improving Upon the generalized c-mu rule: a Whittle approach
Authors:
Zhouzi Li,
Keerthana Gurushankar,
Mor Harchol-Balter,
Alan Scheller-Wolf
Abstract:
Scheduling a stream of jobs whose holding cost changes over time is a classic and practical problem. Specifically, each job is associated with a holding cost (penalty), where a job's instantaneous holding cost is some increasing function of its class and current age (the time it has spent in the system since its arrival). The goal is to schedule the jobs to minimize the time-average total holding…
▽ More
Scheduling a stream of jobs whose holding cost changes over time is a classic and practical problem. Specifically, each job is associated with a holding cost (penalty), where a job's instantaneous holding cost is some increasing function of its class and current age (the time it has spent in the system since its arrival). The goal is to schedule the jobs to minimize the time-average total holding cost across all jobs.
The seminal paper on this problem, by Van Mieghem in 1995, introduced the generalized c-mu rule for scheduling jobs. Since then, this problem has attracted significant interest but remains challenging due to the absence of a finite-dimensional state space formulation. Consequently, subsequent works focus on more tractable versions of this problem.
This paper returns to the original problem, deriving a heuristic that empirically improves upon the generalized c-mu rule and all existing heuristics. Our approach is to first translate the holding cost minimization problem to a novel Restless Multi-Armed Bandit (R-MAB) problem with a finite number of arms. Based on our R-MAB, we derive a novel Whittle Index policy, which is both elegant and intuitive.
△ Less
Submitted 14 April, 2025;
originally announced April 2025.
-
The RESET and MARC Techniques, with Application to Multiserver-Job Analysis
Authors:
Isaac Grosof,
Yige Hong,
Mor Harchol-Balter,
Alan Scheller-Wolf
Abstract:
Multiserver-job (MSJ) systems, where jobs need to run concurrently across many servers, are increasingly common in practice. The default service ordering in many settings is First-Come First-Served (FCFS) service. Virtually all theoretical work on MSJ FCFS models focuses on characterizing the stability region, with almost nothing known about mean response time.
We derive the first explicit chara…
▽ More
Multiserver-job (MSJ) systems, where jobs need to run concurrently across many servers, are increasingly common in practice. The default service ordering in many settings is First-Come First-Served (FCFS) service. Virtually all theoretical work on MSJ FCFS models focuses on characterizing the stability region, with almost nothing known about mean response time.
We derive the first explicit characterization of mean response time in the MSJ FCFS system. Our formula characterizes mean response time up to an additive constant, which becomes negligible as arrival rate approaches throughput, and allows for general phase-type job durations.
We derive our result by utilizing two key techniques: REduction to Saturated for Expected Time (RESET) and MArkovian Relative Completions (MARC).
Using our novel RESET technique, we reduce the problem of characterizing mean response time in the MSJ FCFS system to an M/M/1 with Markovian service rate (MMSR). The Markov chain controlling the service rate is based on the saturated system, a simpler closed system which is far more analytically tractable.
Unfortunately, the MMSR has no explicit characterization of mean response time. We therefore use our novel MARC technique to give the first explicit characterization of mean response time in the MMSR, again up to constant additive error. We specifically introduce the concept of "relative completions," which is the cornerstone of our MARC technique.
△ Less
Submitted 2 October, 2023;
originally announced October 2023.
-
Optimal Scheduling in the Multiserver-job Model under Heavy Traffic
Authors:
Isaac Grosof,
Ziv Scully,
Mor Harchol-Balter,
Alan Scheller-Wolf
Abstract:
Multiserver-job systems, where jobs require concurrent service at many servers, occur widely in practice. Essentially all of the theoretical work on multiserver-job systems focuses on maximizing utilization, with almost nothing known about mean response time. In simpler settings, such as various known-size single-server-job settings, minimizing mean response time is merely a matter of prioritizing…
▽ More
Multiserver-job systems, where jobs require concurrent service at many servers, occur widely in practice. Essentially all of the theoretical work on multiserver-job systems focuses on maximizing utilization, with almost nothing known about mean response time. In simpler settings, such as various known-size single-server-job settings, minimizing mean response time is merely a matter of prioritizing small jobs. However, for the multiserver-job system, prioritizing small jobs is not enough, because we must also ensure servers are not unnecessarily left idle. Thus, minimizing mean response time requires prioritizing small jobs while simultaneously maximizing throughput. Our question is how to achieve these joint objectives.
We devise the ServerFilling-SRPT scheduling policy, which is the first policy to minimize mean response time in the multiserver-job model in the heavy traffic limit. In addition to proving this heavy-traffic result, we present empirical evidence that ServerFilling-SRPT outperforms all existing scheduling policies for all loads, with improvements by orders of magnitude at higher loads.
Because ServerFilling-SRPT requires knowing job sizes, we also define the ServerFilling-Gittins policy, which is optimal when sizes are unknown or partially known.
△ Less
Submitted 4 November, 2022;
originally announced November 2022.
-
Capacity Management in a Pandemic with Endogenous Patient Choices and Flows
Authors:
Sanyukta Deshpande,
Lavanya Marla,
Alan Scheller-Wolf,
Siddharth Prakash Singh
Abstract:
Motivated by the experiences of a healthcare service provider during the Covid-19 pandemic, we aim to study the decisions of a provider that operates both an Emergency Department (ED) and a medical Clinic. Patients contact the provider through a phone call or may present directly at the ED: patients can be COVID (suspected/confirmed) or non-COVID, and have different severities. Depending on the se…
▽ More
Motivated by the experiences of a healthcare service provider during the Covid-19 pandemic, we aim to study the decisions of a provider that operates both an Emergency Department (ED) and a medical Clinic. Patients contact the provider through a phone call or may present directly at the ED: patients can be COVID (suspected/confirmed) or non-COVID, and have different severities. Depending on the severity, patients who contact the provider may be directed to the ED (to be seen in a few hours), be offered an appointment at the Clinic (to be seen in a few days), or be treated via phone or telemedicine, avoiding a visit to a facility. All patients make joining decisions based on comparing their own risk perceptions versus their anticipated benefits: They then choose to enter a facility only if it is beneficial enough. Also, after initial contact, their severities may evolve, which may change their decision. The hospital system's objective is to allocate service capacity across facilities so as to minimize costs from patient deaths or defections. We model the system using a fluid approximation over multiple periods, possibly with different demand profiles. While the feasible space for this problem can be extremely complex, it is amenable to decomposition into different sub-regions that can be analyzed individually, the global optimal solution can be reached via provably parsimonious computational methods over a single period and over multiple periods with different demand rates. Our analytical and computational results indicate that endogeneity results in non-trivial and non-intuitive capacity allocations that do not always prioritize high severity patients, for both single and multi-period settings.
△ Less
Submitted 11 July, 2022;
originally announced July 2022.
-
WCFS: A new framework for analyzing multiserver systems
Authors:
Isaac Grosof,
Mor Harchol-Balter,
Alan Scheller-Wolf
Abstract:
Multiserver queueing systems are found at the core of a wide variety of practical systems. Many important multiserver models have a previously-unexplained similarity: identical mean response time behavior is empirically observed in the heavy traffic limit. We explain this similarity for the first time.
We do so by introducing the work-conserving finite-skip (WCFS) framework, which encompasses a…
▽ More
Multiserver queueing systems are found at the core of a wide variety of practical systems. Many important multiserver models have a previously-unexplained similarity: identical mean response time behavior is empirically observed in the heavy traffic limit. We explain this similarity for the first time.
We do so by introducing the work-conserving finite-skip (WCFS) framework, which encompasses a broad class of important models. This class includes the heterogeneous M/G/k, the limited processor sharing policy for the M/G/1, the threshold parallelism model, and the multiserver-job model under a novel scheduling algorithm.
We prove that for all WCFS models, scaled mean response time $E[T](1-ρ)$ converges to the same value, $E[S^2]/(2E[S])$, in the heavy-traffic limit, which is also the heavy traffic limit for the M/G/1/FCFS. Moreover, we prove additively tight bounds on mean response time for the WCFS class, which hold for all load $ρ$. For each of the four models mentioned above, our bounds are the first known bounds on mean response time.
△ Less
Submitted 12 June, 2022; v1 submitted 26 September, 2021;
originally announced September 2021.
-
Stability for Two-class Multiserver-job Systems
Authors:
Isaac Grosof,
Mor Harchol-Balter,
Alan Scheller-Wolf
Abstract:
Multiserver-job systems, where jobs require concurrent service at many servers, occur widely in practice. Much is known in the dropping setting, where jobs are immediately discarded if they require more servers than are currently available. However, very little is known in the more practical setting where jobs queue instead.
In this paper, we derive a closed-form analytical expression for the st…
▽ More
Multiserver-job systems, where jobs require concurrent service at many servers, occur widely in practice. Much is known in the dropping setting, where jobs are immediately discarded if they require more servers than are currently available. However, very little is known in the more practical setting where jobs queue instead.
In this paper, we derive a closed-form analytical expression for the stability region of a two-class (non-dropping) multiserver-job system where each class of jobs requires a distinct number of servers and requires a distinct exponential distribution of service time, and jobs are served in first-come-first-served (FCFS) order. This is the first result of any kind for an FCFS multiserver-job system where the classes have distinct service distributions. Our work is based on a technique that leverages the idea of a "saturated" system, in which an unlimited number of jobs are always available.
Our analytical formula provides insight into the behavior of FCFS multiserver-job systems, highlighting the huge wastage (idle servers while jobs are in the queue) that can occur, as well as the nonmonotonic effects of the service rates on wastage.
△ Less
Submitted 1 October, 2020;
originally announced October 2020.
-
Simple Near-Optimal Scheduling for the M/G/1
Authors:
Ziv Scully,
Mor Harchol-Balter,
Alan Scheller-Wolf
Abstract:
We consider the problem of preemptively scheduling jobs to minimize mean response time of an M/G/1 queue. When we know each job's size, the shortest remaining processing time (SRPT) policy is optimal. Unfortunately, in many settings we do not have access to each job's size. Instead, we know only the job size distribution. In this setting the Gittins policy is known to minimize mean response time,…
▽ More
We consider the problem of preemptively scheduling jobs to minimize mean response time of an M/G/1 queue. When we know each job's size, the shortest remaining processing time (SRPT) policy is optimal. Unfortunately, in many settings we do not have access to each job's size. Instead, we know only the job size distribution. In this setting the Gittins policy is known to minimize mean response time, but its complex priority structure can be computationally intractable. A much simpler alternative to Gittins is the shortest expected remaining processing time (SERPT) policy. While SERPT is a natural extension of SRPT to unknown job sizes, it is unknown whether or not SERPT is close to optimal for mean response time.
We present a new variant of SERPT called monotonic SERPT (M-SERPT) which is as simple as SERPT but has provably near-optimal mean response time at all loads for any job size distribution. Specifically, we prove the mean response time ratio between M-SERPT and Gittins is at most 3 for load $ρ\leq 8/9$ and at most 5 for any load. This makes M-SERPT the only non-Gittins scheduling policy known to have a constant-factor approximation ratio for mean response time.
△ Less
Submitted 22 January, 2020; v1 submitted 24 July, 2019;
originally announced July 2019.
-
Optimal Scheduling and Exact Response Time Analysis for Multistage Jobs
Authors:
Ziv Scully,
Mor Harchol-Balter,
Alan Scheller-Wolf
Abstract:
Scheduling to minimize mean response time in an M/G/1 queue is a classic problem. The problem is usually addressed in one of two scenarios. In the perfect-information scenario, the scheduler knows each job's exact size, or service requirement. In the zero-information scenario, the scheduler knows only each job's size distribution. The well-known shortest remaining processing time (SRPT) policy is…
▽ More
Scheduling to minimize mean response time in an M/G/1 queue is a classic problem. The problem is usually addressed in one of two scenarios. In the perfect-information scenario, the scheduler knows each job's exact size, or service requirement. In the zero-information scenario, the scheduler knows only each job's size distribution. The well-known shortest remaining processing time (SRPT) policy is optimal in the perfect-information scenario, and the more complex Gittins policy is optimal in the zero-information scenario.
In real systems the scheduler often has partial but incomplete information about each job's size. We introduce a new job model, that of multistage jobs, to capture this partial-information scenario. A multistage job consists of a sequence of stages, where both the sequence of stages and stage sizes are unknown, but the scheduler always knows which stage of a job is in progress. We give an optimal algorithm for scheduling multistage jobs in an M/G/1 queue and an exact response time analysis of our algorithm.
△ Less
Submitted 12 November, 2018; v1 submitted 17 May, 2018;
originally announced May 2018.
-
SOAP: One Clean Analysis of All Age-Based Scheduling Policies
Authors:
Ziv Scully,
Mor Harchol-Balter,
Alan Scheller-Wolf
Abstract:
We consider an extremely broad class of M/G/1 scheduling policies called SOAP: Schedule Ordered by Age-based Priority. The SOAP policies include almost all scheduling policies in the literature as well as an infinite number of variants which have never been analyzed, or maybe not even conceived. SOAP policies range from classic policies, like first-come, first-serve (FCFS), foreground-background (…
▽ More
We consider an extremely broad class of M/G/1 scheduling policies called SOAP: Schedule Ordered by Age-based Priority. The SOAP policies include almost all scheduling policies in the literature as well as an infinite number of variants which have never been analyzed, or maybe not even conceived. SOAP policies range from classic policies, like first-come, first-serve (FCFS), foreground-background (FB), class-based priority, and shortest remaining processing time (SRPT); to much more complicated scheduling rules, such as the famously complex Gittins index policy and other policies in which a job's priority changes arbitrarily with its age. While the response time of policies in the former category is well understood, policies in the latter category have resisted response time analysis. We present a universal analysis of all SOAP policies, deriving the mean and Laplace-Stieltjes transform of response time.
△ Less
Submitted 17 February, 2018; v1 submitted 3 December, 2017;
originally announced December 2017.
-
Delay Asymptotics and Bounds for Multi-Task Parallel Jobs
Authors:
Weina Wang,
Mor Harchol-Balter,
Haotian Jiang,
Alan Scheller-Wolf,
R. Srikant
Abstract:
We study delay of jobs that consist of multiple parallel tasks, which is a critical performance metric in a wide range of applications such as data file retrieval in coded storage systems and parallel computing. In this problem, each job is completed only when all of its tasks are completed, so the delay of a job is the maximum of the delays of its tasks. Despite the wide attention this problem ha…
▽ More
We study delay of jobs that consist of multiple parallel tasks, which is a critical performance metric in a wide range of applications such as data file retrieval in coded storage systems and parallel computing. In this problem, each job is completed only when all of its tasks are completed, so the delay of a job is the maximum of the delays of its tasks. Despite the wide attention this problem has received, tight analysis is still largely unknown since analyzing job delay requires characterizing the complicated correlation among task delays, which is hard to do.
We first consider an asymptotic regime where the number of servers, $n$, goes to infinity, and the number of tasks in a job, $k^{(n)}$, is allowed to increase with $n$. We establish the asymptotic independence of any $k^{(n)}$ queues under the condition $k^{(n)} = o(n^{1/4})$. This greatly generalizes the asymptotic-independence type of results in the literature where asymptotic independence is shown only for a fixed constant number of queues. As a consequence of our independence result, the job delay converges to the maximum of independent task delays.
We next consider the non-asymptotic regime. Here we prove that independence yields a stochastic upper bound on job delay for any $n$ and any $k^{(n)}$ with $k^{(n)}\le n$. The key component of our proof is a new technique we develop, called "Poisson oversampling". Our approach converts the job delay problem into a corresponding balls-and-bins problem. However, in contrast with typical balls-and-bins problems where there is a negative correlation among bins, we prove that our variant exhibits positive correlation.
△ Less
Submitted 15 September, 2018; v1 submitted 1 October, 2017;
originally announced October 2017.