-
Improved and Oracle-Efficient Online $\ell_1$-Multicalibration
Authors:
Rohan Ghuge,
Vidya Muthukumar,
Sahil Singla
Abstract:
We study \emph{online multicalibration}, a framework for ensuring calibrated predictions across multiple groups in adversarial settings, across $T$ rounds. Although online calibration is typically studied in the $\ell_1$ norm, prior approaches to online multicalibration have taken the indirect approach of obtaining rates in other norms (such as $\ell_2$ and $\ell_{\infty}$) and then transferred th…
▽ More
We study \emph{online multicalibration}, a framework for ensuring calibrated predictions across multiple groups in adversarial settings, across $T$ rounds. Although online calibration is typically studied in the $\ell_1$ norm, prior approaches to online multicalibration have taken the indirect approach of obtaining rates in other norms (such as $\ell_2$ and $\ell_{\infty}$) and then transferred these guarantees to $\ell_1$ at additional loss. In contrast, we propose a direct method that achieves improved and oracle-efficient rates of $\widetilde{\mathcal{O}}(T^{-1/3})$ and $\widetilde{\mathcal{O}}(T^{-1/4})$ respectively, for online $\ell_1$-multicalibration. Our key insight is a novel reduction of online \(\ell_1\)-multicalibration to an online learning problem with product-based rewards, which we refer to as \emph{online linear-product optimization} ($\mathtt{OLPO}$).
To obtain the improved rate of $\widetilde{\mathcal{O}}(T^{-1/3})$, we introduce a linearization of $\mathtt{OLPO}$ and design a no-regret algorithm for this linearized problem. Although this method guarantees the desired sublinear rate (nearly matching the best rate for online calibration), it is computationally expensive when the group family \(\mathcal{H}\) is large or infinite, since it enumerates all possible groups. To address scalability, we propose a second approach to $\mathtt{OLPO}$ that makes only a polynomial number of calls to an offline optimization (\emph{multicalibration evaluation}) oracle, resulting in \emph{oracle-efficient} online \(\ell_1\)-multicalibration with a rate of $\widetilde{\mathcal{O}}(T^{-1/4})$. Our framework also extends to certain infinite families of groups (e.g., all linear functions on the context space) by exploiting a $1$-Lipschitz property of the \(\ell_1\)-multicalibration error with respect to \(\mathcal{H}\).
△ Less
Submitted 28 May, 2025; v1 submitted 22 May, 2025;
originally announced May 2025.
-
Single-Sample and Robust Online Resource Allocation
Authors:
Rohan Ghuge,
Sahil Singla,
Yifan Wang
Abstract:
Online Resource Allocation problem is a central problem in many areas of Computer Science, Operations Research, and Economics. In this problem, we sequentially receive $n$ stochastic requests for $m$ kinds of shared resources, where each request can be satisfied in multiple ways, consuming different amounts of resources and generating different values. The goal is to achieve a $(1-ε)$-approximatio…
▽ More
Online Resource Allocation problem is a central problem in many areas of Computer Science, Operations Research, and Economics. In this problem, we sequentially receive $n$ stochastic requests for $m$ kinds of shared resources, where each request can be satisfied in multiple ways, consuming different amounts of resources and generating different values. The goal is to achieve a $(1-ε)$-approximation to the hindsight optimum, where $ε>0$ is a small constant, assuming each resource has a large budget.
In this paper, we investigate the learnability and robustness of online resource allocation. Our primary contribution is a novel Exponential Pricing algorithm with the following properties: 1. It requires only a \emph{single sample} from each of the $n$ request distributions to achieve a $(1-ε)$-approximation for online resource allocation with large budgets. Such an algorithm was previously unknown, even with access to polynomially many samples, as prior work either assumed full distributional knowledge or was limited to i.i.d.\,or random-order arrivals. 2. It is robust to corruptions in the outliers model and the value augmentation model. Specifically, it maintains its $(1 - ε)$-approximation guarantee under both these robustness models, resolving the open question posed in Argue, Gupta, Molinaro, and Singla (SODA'22). 3. It operates as a simple item-pricing algorithm that ensures incentive compatibility.
The intuition behind our Exponential Pricing algorithm is that the price of a resource should adjust exponentially as it is overused or underused. It differs from conventional approaches that use an online learning algorithm for item pricing. This departure guarantees that the algorithm will never run out of any resource, but loses the usual no-regret properties of online learning algorithms, necessitating a new analytical approach.
△ Less
Submitted 5 May, 2025;
originally announced May 2025.
-
Stochastic Scheduling with Abandonments via Greedy Strategies
Authors:
Yihua Xu,
Rohan Ghuge,
Sebastian Perez-Salazar
Abstract:
Motivated by applications where impatience is pervasive and service times are uncertain, we study a scheduling model where jobs may depart at an unknown point in time and service times are stochastic. Initially, we have access to a single server and $n$ jobs with known non-negative values: these jobs have unknown stochastic service and departure times with known distributional information, which w…
▽ More
Motivated by applications where impatience is pervasive and service times are uncertain, we study a scheduling model where jobs may depart at an unknown point in time and service times are stochastic. Initially, we have access to a single server and $n$ jobs with known non-negative values: these jobs have unknown stochastic service and departure times with known distributional information, which we assume to be independent. When the server is free, we can run an available job which occupies the server for an unknown amount of time, and collect its value. The objective is to maximize the expected total value obtained from jobs run on the server. Natural formulations of this problem suffer from the curse of dimensionality. In fact, this problem is NP-hard even in the deterministic case. Hence, we focus on efficiently computable approximation algorithms that can provide high expected reward compared to the optimal expected value. Towards this end, we first provide a compact linear programming (LP) relaxation that gives an upper bound on the expected value obtained by the optimal policy. Then we design a polynomial-time algorithm that is nearly a $(1/2)\cdot (1-1/e)$-approximation to the optimal LP value (so also to the optimal expected value). We next shift our focus to the case of independent and identically distributed (i.i.d.) service times. In this case, we show that the greedy policy that always runs the highest-valued job whenever the server is free obtains a $1/2$-approximation to the optimal expected value. Our approaches extend effortlessly and we demonstrate their flexibility by providing approximations to natural extensions of our problem. Finally, we evaluate our LP-based policies and the greedy policy empirically on synthetic and real datasets.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
Semi-Bandit Learning for Monotone Stochastic Optimization
Authors:
Arpit Agarwal,
Rohan Ghuge,
Viswanath Nagarajan
Abstract:
Stochastic optimization is a widely used approach for optimization under uncertainty, where uncertain input parameters are modeled by random variables. Exact or approximation algorithms have been obtained for several fundamental problems in this area. However, a significant limitation of this approach is that it requires full knowledge of the underlying probability distributions. Can we still get…
▽ More
Stochastic optimization is a widely used approach for optimization under uncertainty, where uncertain input parameters are modeled by random variables. Exact or approximation algorithms have been obtained for several fundamental problems in this area. However, a significant limitation of this approach is that it requires full knowledge of the underlying probability distributions. Can we still get good (approximation) algorithms if these distributions are unknown, and the algorithm needs to learn them through repeated interactions? In this paper, we resolve this question for a large class of "monotone" stochastic problems, by providing a generic online learning algorithm with $\sqrt{T \log T}$ regret relative to the best approximation algorithm (under known distributions). Importantly, our online algorithm works in a semi-bandit setting, where in each period, the algorithm only observes samples from the r.v.s that were actually probed. Our framework applies to several fundamental problems in stochastic optimization such as prophet inequality, Pandora's box, stochastic knapsack, stochastic matchings and stochastic submodular optimization.
△ Less
Submitted 24 December, 2023;
originally announced December 2023.
-
Informative Path Planning with Limited Adaptivity
Authors:
Rayen Tan,
Rohan Ghuge,
Viswanath Nagarajan
Abstract:
We consider the informative path planning ($\mathtt{IPP}$) problem in which a robot interacts with an uncertain environment and gathers information by visiting locations. The goal is to minimize its expected travel cost to cover a given submodular function. Adaptive solutions, where the robot incorporates all available information to select the next location to visit, achieve the best objective. H…
▽ More
We consider the informative path planning ($\mathtt{IPP}$) problem in which a robot interacts with an uncertain environment and gathers information by visiting locations. The goal is to minimize its expected travel cost to cover a given submodular function. Adaptive solutions, where the robot incorporates all available information to select the next location to visit, achieve the best objective. However, such a solution is resource-intensive as it entails recomputing after every visited location. A more practical approach is to design solutions with a small number of adaptive "rounds", where the robot recomputes only once at the start of each round. In this paper, we design an algorithm for $\mathtt{IPP}$ parameterized by the number $k$ of adaptive rounds, and prove a smooth trade-off between $k$ and the solution quality (relative to fully adaptive solutions). We validate our theoretical results by experiments on a real road network, where we observe that a few rounds of adaptivity suffice to obtain solutions of cost almost as good as fully-adaptive ones.
△ Less
Submitted 21 November, 2023;
originally announced November 2023.
-
An Asymptotically Optimal Batched Algorithm for the Dueling Bandit Problem
Authors:
Arpit Agarwal,
Rohan Ghuge,
Viswanath Nagarajan
Abstract:
We study the $K$-armed dueling bandit problem, a variation of the traditional multi-armed bandit problem in which feedback is obtained in the form of pairwise comparisons. Previous learning algorithms have focused on the $\textit{fully adaptive}$ setting, where the algorithm can make updates after every comparison. The "batched" dueling bandit problem is motivated by large-scale applications like…
▽ More
We study the $K$-armed dueling bandit problem, a variation of the traditional multi-armed bandit problem in which feedback is obtained in the form of pairwise comparisons. Previous learning algorithms have focused on the $\textit{fully adaptive}$ setting, where the algorithm can make updates after every comparison. The "batched" dueling bandit problem is motivated by large-scale applications like web search ranking and recommendation systems, where performing sequential updates may be infeasible. In this work, we ask: $\textit{is there a solution using only a few adaptive rounds that matches the asymptotic regret bounds of the best sequential algorithms for $K$-armed dueling bandits?}$ We answer this in the affirmative $\textit{under the Condorcet condition}$, a standard setting of the $K$-armed dueling bandit problem. We obtain asymptotic regret of $O(K^2\log^2(K)) + O(K\log(T))$ in $O(\log(T))$ rounds, where $T$ is the time horizon. Our regret bounds nearly match the best regret bounds known in the fully sequential setting under the Condorcet condition. Finally, in computational experiments over a variety of real-world datasets, we observe that our algorithm using $O(\log(T))$ rounds achieves almost the same performance as fully sequential algorithms (that use $T$ rounds).
△ Less
Submitted 24 September, 2022;
originally announced September 2022.
-
Batched Dueling Bandits
Authors:
Arpit Agarwal,
Rohan Ghuge,
Viswanath Nagarajan
Abstract:
The $K$-armed dueling bandit problem, where the feedback is in the form of noisy pairwise comparisons, has been widely studied. Previous works have only focused on the sequential setting where the policy adapts after every comparison. However, in many applications such as search ranking and recommendation systems, it is preferable to perform comparisons in a limited number of parallel batches. We…
▽ More
The $K$-armed dueling bandit problem, where the feedback is in the form of noisy pairwise comparisons, has been widely studied. Previous works have only focused on the sequential setting where the policy adapts after every comparison. However, in many applications such as search ranking and recommendation systems, it is preferable to perform comparisons in a limited number of parallel batches. We study the batched $K$-armed dueling bandit problem under two standard settings: (i) existence of a Condorcet winner, and (ii) strong stochastic transitivity and stochastic triangle inequality. For both settings, we obtain algorithms with a smooth trade-off between the number of batches and regret. Our regret bounds match the best known sequential regret bounds (up to poly-logarithmic factors), using only a logarithmic number of batches. We complement our regret analysis with a nearly-matching lower bound. Finally, we also validate our theoretical results via experiments on synthetic and real data.
△ Less
Submitted 21 February, 2022;
originally announced February 2022.
-
Non-Adaptive Stochastic Score Classification and Explainable Halfspace Evaluation
Authors:
Rohan Ghuge,
Anupam Gupta,
Viswanath Nagarajan
Abstract:
Sequential testing problems involve a complex system with several components, each of which is "working" with some independent probability. The outcome of each component can be determined by performing a test, which incurs some cost. The overall system status is given by a function $f$ of the outcomes of its components. The goal is to evaluate this function $f$ by performing tests at the minimum e…
▽ More
Sequential testing problems involve a complex system with several components, each of which is "working" with some independent probability. The outcome of each component can be determined by performing a test, which incurs some cost. The overall system status is given by a function $f$ of the outcomes of its components. The goal is to evaluate this function $f$ by performing tests at the minimum expected cost. While there has been extensive prior work on this topic, provable approximation bounds are mainly limited to simple functions like ``k-out-of-n'' and halfspaces. We consider significantly more general "score classification" functions, and provide the first constant factor approximation algorithm (improving over a previous logarithmic approximation ratio). Moreover, our policy is non adaptive: it just involves performing tests in an a priori fixed order. We also consider the related halfspace evaluation problem, where we want to evaluate some function on $d$ halfspaces (e.g., intersection of halfspaces). We show that our approach provides an $O(d^2\log d)$-approximation algorithm for this problem. Our algorithms also extend to the setting of "batched'' tests, where multiple tests can be performed simultaneously while incurring an extra setup cost. Finally, we perform computational experiments that demonstrate the practical performance of our algorithm for score classification. We observe that, for most instances, the cost of our algorithm is within $50\%$ of an information-theoretic lower bound on the optimal value.
△ Less
Submitted 19 August, 2023; v1 submitted 10 November, 2021;
originally announced November 2021.
-
The Power of Adaptivity for Stochastic Submodular Cover
Authors:
Rohan Ghuge,
Anupam Gupta,
Viswanath Nagarajan
Abstract:
In the stochastic submodular cover problem, the goal is to select a subset of stochastic items of minimum expected cost to cover a submodular function. Solutions in this setting correspond to sequential decision processes that select items one by one "adaptively" (depending on prior observations). While such adaptive solutions achieve the best objective, the inherently sequential nature makes them…
▽ More
In the stochastic submodular cover problem, the goal is to select a subset of stochastic items of minimum expected cost to cover a submodular function. Solutions in this setting correspond to sequential decision processes that select items one by one "adaptively" (depending on prior observations). While such adaptive solutions achieve the best objective, the inherently sequential nature makes them undesirable in many applications. We ask: how well can solutions with only a few adaptive rounds approximate fully-adaptive solutions? We give nearly tight answers for both independent and correlated settings, proving smooth tradeoffs between the number of adaptive rounds and the solution quality, relative to fully adaptive solutions. Experiments on synthetic and real datasets show qualitative improvements in the solutions as we allow more rounds of adaptivity; in practice, solutions with a few rounds of adaptivity are nearly as good as fully adaptive solutions.
△ Less
Submitted 30 June, 2021;
originally announced June 2021.
-
Quasi-Polynomial Algorithms for Submodular Tree Orienteering and Other Directed Network Design Problems
Authors:
Rohan Ghuge,
Viswanath Nagarajan
Abstract:
We consider the following general network design problem on directed graphs. The input is an asymmetric metric $(V,c)$, root $r^{*}\in V$, monotone submodular function $f:2^V\rightarrow \mathbb{R}_+$ and budget $B$. The goal is to find an $r^{*}$-rooted arborescence $T$ of cost at most $B$ that maximizes $f(T)$. Our main result is a simple quasi-polynomial time $O(\frac{\log k}{\log\log k})$-appro…
▽ More
We consider the following general network design problem on directed graphs. The input is an asymmetric metric $(V,c)$, root $r^{*}\in V$, monotone submodular function $f:2^V\rightarrow \mathbb{R}_+$ and budget $B$. The goal is to find an $r^{*}$-rooted arborescence $T$ of cost at most $B$ that maximizes $f(T)$. Our main result is a simple quasi-polynomial time $O(\frac{\log k}{\log\log k})$-approximation algorithm for this problem, where $k\le |V|$ is the number of vertices in an optimal solution. To the best of our knowledge, this is the first non-trivial approximation ratio for this problem. As a consequence we obtain an $O(\frac{\log^2 k}{\log\log k})$-approximation algorithm for directed (polymatroid) Steiner tree in quasi-polynomial time. We also extend our main result to a setting with additional length bounds at vertices, which leads to improved $O(\frac{\log^2 k}{\log\log k})$-approximation algorithms for the single-source buy-at-bulk and priority Steiner tree problems. For the usual directed Steiner tree problem, our result matches the best previous approximation ratio [GLL19]. Our algorithm has the advantage of being deterministic and faster: the runtime is $\exp(O(\log n\, \log^{1+ε} k))$. For polymatroid Steiner tree and single-source buy-at-bulk, our result improves prior approximation ratios by a logarithmic factor. For directed priority Steiner tree, our result seems to be the first non-trivial approximation ratio. All our approximation ratios are tight (up to constant factors) for quasi-polynomial algorithms.
△ Less
Submitted 1 April, 2019; v1 submitted 4 December, 2018;
originally announced December 2018.