Search | arXiv e-print repository

The Data-Driven Censored Newsvendor Problem

Authors: Chamsi Hssaine, Sean R. Sinclair

Abstract: We study a censored variant of the data-driven newsvendor problem, where the decision-maker must select an ordering quantity that minimizes expected overage and underage costs based only on offline censored sales data, rather than historical demand realizations. Our goal is to understand how the degree of historical demand censoring affects the performance of any learning algorithm for this proble… ▽ More We study a censored variant of the data-driven newsvendor problem, where the decision-maker must select an ordering quantity that minimizes expected overage and underage costs based only on offline censored sales data, rather than historical demand realizations. Our goal is to understand how the degree of historical demand censoring affects the performance of any learning algorithm for this problem. To isolate this impact, we adopt a distributionally robust optimization framework, evaluating policies according to their worst-case regret over an ambiguity set of distributions. This set is defined by the largest historical order quantity (the observable boundary of the dataset), and contains all distributions matching the true demand distribution up to this boundary, while allowing them to be arbitrary afterwards. We demonstrate a spectrum of achievability under demand censoring by deriving a natural necessary and sufficient condition under which vanishing regret is an achievable goal. In regimes in which it is not, we exactly characterize the information loss due to censoring: an insurmountable lower bound on the performance of any policy, even when the decision-maker has access to infinitely many demand samples. We then leverage these sharp characterizations to propose a natural robust algorithm that adapts to the historical level of demand censoring. We derive finite-sample guarantees for this algorithm across all possible censoring regimes and show its near-optimality with matching lower bounds (up to polylogarithmic factors). We moreover demonstrate its robust performance via extensive numerical experiments on both synthetic and real-world datasets. △ Less

Submitted 18 December, 2024; v1 submitted 2 December, 2024; originally announced December 2024.

Comments: 72 pages, 9 tables, 7 figures

arXiv:2409.14557 [pdf, other]

Exploiting Exogenous Structure for Sample-Efficient Reinforcement Learning

Authors: Jia Wan, Sean R. Sinclair, Devavrat Shah, Martin J. Wainwright

Abstract: We study Exo-MDPs, a structured class of Markov Decision Processes (MDPs) where the state space is partitioned into exogenous and endogenous components. Exogenous states evolve stochastically, independent of the agent's actions, while endogenous states evolve deterministically based on both state components and actions. Exo-MDPs are useful for applications including inventory control, portfolio ma… ▽ More We study Exo-MDPs, a structured class of Markov Decision Processes (MDPs) where the state space is partitioned into exogenous and endogenous components. Exogenous states evolve stochastically, independent of the agent's actions, while endogenous states evolve deterministically based on both state components and actions. Exo-MDPs are useful for applications including inventory control, portfolio management, and ride-sharing. Our first result is structural, establishing a representational equivalence between the classes of discrete MDPs, Exo-MDPs, and discrete linear mixture MDPs. Specifically, any discrete MDP can be represented as an Exo-MDP, and the transition and reward dynamics can be written as linear functions of the exogenous state distribution, showing that Exo-MDPs are instances of linear mixture MDPs. For unobserved exogenous states, we prove a regret upper bound of $O(H^{3/2}d\sqrt{K})$ over $K$ trajectories of horizon $H$, with $d$ as the size of the exogenous state space, and establish nearly-matching lower bounds. Our findings demonstrate how Exo-MDPs decouple sample complexity from action and endogenous state sizes, and we validate our theoretical insights with experiments on inventory control. △ Less

Submitted 5 February, 2025; v1 submitted 22 September, 2024; originally announced September 2024.

Comments: 43 pages

arXiv:2408.04488 [pdf, other]

Multi-Objective LQR with Linear Scalarization

Authors: Ali Jadbabaie, Devavrat Shah, Sean R. Sinclair

Abstract: The framework of decision-making, modeled as a Markov Decision Process (MDP), typically assumes a single objective. However, practical scenarios often involve tradeoffs between multiple objectives. We address this in the Linear Quadratic Regulator (LQR), a canonical continuous, infinite horizon MDP. First, we establish that the Pareto front for LQR is characterized by linear scalarization: a conve… ▽ More The framework of decision-making, modeled as a Markov Decision Process (MDP), typically assumes a single objective. However, practical scenarios often involve tradeoffs between multiple objectives. We address this in the Linear Quadratic Regulator (LQR), a canonical continuous, infinite horizon MDP. First, we establish that the Pareto front for LQR is characterized by linear scalarization: a convex combination of objectives recovers all tradeoff points, making multi-objective LQR reducible to single-objective problems. This highlights an important instance where linear scalarization suffices for a non-convex problem. Second, we show the Pareto front is smooth, in that an $ε$ perturbation of a scalarization parameter yields an $ε$ approximation to the objective. These results inspire a simple algorithm to approximate the Pareto front via grid search over scalarization parameters, where each optimization problem retains the computational efficiency of single-objective LQR. Lastly, we extend the analysis to certainty equivalence, where unknown dynamics are replaced with estimates. △ Less

Submitted 15 January, 2025; v1 submitted 8 August, 2024; originally announced August 2024.

Comments: 38 pages, 2 figures

arXiv:2406.02402 [pdf, other]

Online Fair Allocation of Perishable Resources

Authors: Siddhartha Banerjee, Chamsi Hssaine, Sean R. Sinclair

Abstract: We consider a practically motivated variant of the canonical online fair allocation problem: a decision-maker has a budget of perishable resources to allocate over a fixed number of rounds. Each round sees a random number of arrivals, and the decision-maker must commit to an allocation for these individuals before moving on to the next round. The goal is to construct a sequence of allocations that… ▽ More We consider a practically motivated variant of the canonical online fair allocation problem: a decision-maker has a budget of perishable resources to allocate over a fixed number of rounds. Each round sees a random number of arrivals, and the decision-maker must commit to an allocation for these individuals before moving on to the next round. The goal is to construct a sequence of allocations that is envy-free and efficient. Our work makes two important contributions toward this problem: we first derive strong lower bounds on the optimal envy-efficiency trade-off that demonstrate that a decision-maker is fundamentally limited in what she can hope to achieve relative to the no-perishing setting; we then design an algorithm achieving these lower bounds which takes as input $(i)$ a prediction of the perishing order, and $(ii)$ a desired bound on envy. Given the remaining budget in each period, the algorithm uses forecasts of future demand and perishing to adaptively choose one of two carefully constructed guardrail quantities. We demonstrate our algorithm's strong numerical performance - and state-of-the-art, perishing-agnostic algorithms' inefficacy - on simulations calibrated to a real-world dataset. △ Less

Submitted 4 June, 2024; originally announced June 2024.

Comments: 51 pages, 8 figures

MSC Class: 91B32

arXiv:2210.00025 [pdf, other]

Artificial Replay: A Meta-Algorithm for Harnessing Historical Data in Bandits

Authors: Siddhartha Banerjee, Sean R. Sinclair, Milind Tambe, Lily Xu, Christina Lee Yu

Abstract: Most real-world deployments of bandit algorithms exist somewhere in between the offline and online set-up, where some historical data is available upfront and additional data is collected dynamically online. How best to incorporate historical data to "warm start" bandit algorithms is an open question: naively initializing reward estimates using all historical samples can suffer from spurious data… ▽ More Most real-world deployments of bandit algorithms exist somewhere in between the offline and online set-up, where some historical data is available upfront and additional data is collected dynamically online. How best to incorporate historical data to "warm start" bandit algorithms is an open question: naively initializing reward estimates using all historical samples can suffer from spurious data and imbalanced data coverage, leading to data inefficiency (amount of historical data used) - particularly for continuous action spaces. To address these challenges, we propose ArtificialReplay, a meta-algorithm for incorporating historical data into any arbitrary base bandit algorithm. We show that ArtificialReplay uses only a fraction of the historical data compared to a full warm-start approach, while still achieving identical regret for base algorithms that satisfy independence of irrelevant data (IIData), a novel and broadly applicable property that we introduce. We complement these theoretical results with experiments on K-armed bandits and continuous combinatorial bandits, on which we model green security domains using real poaching data. Our results show the practical benefits of ArtificialReplay for improving data efficiency, including for base algorithms that do not satisfy IIData. △ Less

Submitted 19 March, 2025; v1 submitted 30 September, 2022; originally announced October 2022.

Comments: 55 pages (30 pages main paper), 9 figures

arXiv:2207.06272 [pdf, other]

Hindsight Learning for MDPs with Exogenous Inputs

Authors: Sean R. Sinclair, Felipe Frujeri, Ching-An Cheng, Luke Marshall, Hugo Barbalho, Jingling Li, Jennifer Neville, Ishai Menache, Adith Swaminathan

Abstract: Many resource management problems require sequential decision-making under uncertainty, where the only uncertainty affecting the decision outcomes are exogenous variables outside the control of the decision-maker. We model these problems as Exo-MDPs (Markov Decision Processes with Exogenous Inputs) and design a class of data-efficient algorithms for them termed Hindsight Learning (HL). Our HL algo… ▽ More Many resource management problems require sequential decision-making under uncertainty, where the only uncertainty affecting the decision outcomes are exogenous variables outside the control of the decision-maker. We model these problems as Exo-MDPs (Markov Decision Processes with Exogenous Inputs) and design a class of data-efficient algorithms for them termed Hindsight Learning (HL). Our HL algorithms achieve data efficiency by leveraging a key insight: having samples of the exogenous variables, past decisions can be revisited in hindsight to infer counterfactual consequences that can accelerate policy improvements. We compare HL against classic baselines in the multi-secretary and airline revenue management problems. We also scale our algorithms to a business-critical cloud resource management problem -- allocating Virtual Machines (VMs) to physical machines, and simulate their performance with real datasets from a large public cloud provider. We find that HL algorithms outperform domain-specific heuristics, as well as state-of-the-art reinforcement learning methods. △ Less

Submitted 23 October, 2023; v1 submitted 13 July, 2022; originally announced July 2022.

Comments: 52 pages, 6 figures

MSC Class: 68Q32 ACM Class: I.2.6

arXiv:2110.15843 [pdf, other]

doi 10.1287/opre.2022.2396

Adaptive Discretization in Online Reinforcement Learning

Authors: Sean R. Sinclair, Siddhartha Banerjee, Christina Lee Yu

Abstract: Discretization based approaches to solving online reinforcement learning problems have been studied extensively in practice on applications ranging from resource allocation to cache management. Two major questions in designing discretization-based algorithms are how to create the discretization and when to refine it. While there have been several experimental results investigating heuristic soluti… ▽ More Discretization based approaches to solving online reinforcement learning problems have been studied extensively in practice on applications ranging from resource allocation to cache management. Two major questions in designing discretization-based algorithms are how to create the discretization and when to refine it. While there have been several experimental results investigating heuristic solutions to these questions, there has been little theoretical treatment. In this paper we provide a unified theoretical analysis of tree-based hierarchical partitioning methods for online reinforcement learning, providing model-free and model-based algorithms. We show how our algorithms are able to take advantage of inherent structure of the problem by providing guarantees that scale with respect to the 'zooming dimension' instead of the ambient dimension, an instance-dependent quantity measuring the benignness of the optimal $Q_h^\star$ function. Many applications in computing systems and operations research requires algorithms that compete on three facets: low sample complexity, mild storage requirements, and low computational burden. Our algorithms are easily adapted to operating constraints, and our theory provides explicit bounds across each of the three facets. This motivates its use in practical applications as our approach automatically adapts to underlying problem structure even when very little is known a priori about the system. △ Less

Submitted 10 October, 2022; v1 submitted 29 October, 2021; originally announced October 2021.

Comments: 77 pages, 7 figures. arXiv admin note: text overlap with arXiv:2007.00717

MSC Class: 68Q32 ACM Class: I.2.6

arXiv:2105.05308 [pdf, other]

Sequential Fair Allocation: Achieving the Optimal Envy-Efficiency Tradeoff Curve

Authors: Sean R. Sinclair, Gauri Jain, Siddhartha Banerjee, Christina Lee Yu

Abstract: We consider the problem of dividing limited resources to individuals arriving over $T$ rounds. Each round has a random number of individuals arrive, and individuals can be characterized by their type (i.e. preferences over the different resources). A standard notion of 'fairness' in this setting is that an allocation simultaneously satisfy envy-freeness and efficiency. The former is an individual… ▽ More We consider the problem of dividing limited resources to individuals arriving over $T$ rounds. Each round has a random number of individuals arrive, and individuals can be characterized by their type (i.e. preferences over the different resources). A standard notion of 'fairness' in this setting is that an allocation simultaneously satisfy envy-freeness and efficiency. The former is an individual guarantee, requiring that each agent prefers their own allocation over the allocation of any other; in contrast, efficiency is a global property, requiring that the allocations clear the available resources. For divisible resources, when the number of individuals of each type are known upfront, the above desiderata are simultaneously achievable for a large class of utility functions. However, in an online setting when the number of individuals of each type are only revealed round by round, no policy can guarantee these desiderata simultaneously, and hence the best one can do is to try and allocate so as to approximately satisfy the two properties. We show that in the online setting, the two desired properties (envy-freeness and efficiency) are in direct contention, in that any algorithm achieving additive counterfactual envy-freeness up to a factor of $L_T$ necessarily suffers a efficiency loss of at least $1 / L_T$. We complement this uncertainty principle with a simple algorithm, HopeGuardrail, which allocates resources based on an adaptive threshold policy and is able to achieve any fairness-efficiency point on this frontier. In simulation results, our algorithm provides allocations close to the optimal fair solution in hindsight, motivating its use in practical applications as the algorithm is able to adapt to any desired fairness efficiency trade-off. △ Less

Submitted 29 September, 2022; v1 submitted 11 May, 2021; originally announced May 2021.

Comments: 42 pages, 5 figures

MSC Class: 91B32

arXiv:2011.14382 [pdf, other]

Sequential Fair Allocation of Limited Resources under Stochastic Demands

Authors: Sean R. Sinclair, Gauri Jain, Siddhartha Banerjee, Christina Lee Yu

Abstract: We consider the problem of dividing limited resources between a set of agents arriving sequentially with unknown (stochastic) utilities. Our goal is to find a fair allocation - one that is simultaneously Pareto-efficient and envy-free. When all utilities are known upfront, the above desiderata are simultaneously achievable (and efficiently computable) for a large class of utility functions. In a s… ▽ More We consider the problem of dividing limited resources between a set of agents arriving sequentially with unknown (stochastic) utilities. Our goal is to find a fair allocation - one that is simultaneously Pareto-efficient and envy-free. When all utilities are known upfront, the above desiderata are simultaneously achievable (and efficiently computable) for a large class of utility functions. In a sequential setting, however, no policy can guarantee these desiderata simultaneously for all possible utility realizations. A natural online fair allocation objective is to minimize the deviation of each agent's final allocation from their fair allocation in hindsight. This translates into simultaneous guarantees for both Pareto-efficiency and envy-freeness. However, the resulting dynamic program has state-space which is exponential in the number of agents. We propose a simple policy, HopeOnline, that instead aims to `match' the ex-post fair allocation vector using the current available resources and `predicted' histogram of future utilities. We demonstrate the effectiveness of our policy compared to other heurstics on a dataset inspired by mobile food-bank allocations. △ Less

Submitted 9 July, 2022; v1 submitted 29 November, 2020; originally announced November 2020.

Comments: See arXiv:2105.05308 for an updated version. 36 pages, 6 figures

MSC Class: 91B32

arXiv:2007.00717 [pdf, other]

Adaptive Discretization for Model-Based Reinforcement Learning

Authors: Sean R. Sinclair, Tianyu Wang, Gauri Jain, Siddhartha Banerjee, Christina Lee Yu

Abstract: We introduce the technique of adaptive discretization to design an efficient model-based episodic reinforcement learning algorithm in large (potentially continuous) state-action spaces. Our algorithm is based on optimistic one-step value iteration extended to maintain an adaptive discretization of the space. From a theoretical perspective we provide worst-case regret bounds for our algorithm which… ▽ More We introduce the technique of adaptive discretization to design an efficient model-based episodic reinforcement learning algorithm in large (potentially continuous) state-action spaces. Our algorithm is based on optimistic one-step value iteration extended to maintain an adaptive discretization of the space. From a theoretical perspective we provide worst-case regret bounds for our algorithm which are competitive compared to the state-of-the-art model-based algorithms. Moreover, our bounds are obtained via a modular proof technique which can potentially extend to incorporate additional structure on the problem. From an implementation standpoint, our algorithm has much lower storage and computational requirements due to maintaining a more efficient partition of the state and action spaces. We illustrate this via experiments on several canonical control problems, which shows that our algorithm empirically performs significantly better than fixed discretization in terms of both faster convergence and lower memory usage. Interestingly, we observe empirically that while fixed-discretization model-based algorithms vastly outperform their model-free counterparts, the two achieve comparable performance with adaptive discretization. △ Less

Submitted 23 October, 2020; v1 submitted 1 July, 2020; originally announced July 2020.

Comments: 50 pages, 7 figures

MSC Class: 68Q32 ACM Class: I.2.6

arXiv:1910.08151 [pdf, other]

doi 10.1145/3366703

Adaptive Discretization for Episodic Reinforcement Learning in Metric Spaces

Authors: Sean R. Sinclair, Siddhartha Banerjee, Christina Lee Yu

Abstract: We present an efficient algorithm for model-free episodic reinforcement learning on large (potentially continuous) state-action spaces. Our algorithm is based on a novel $Q$-learning policy with adaptive data-driven discretization. The central idea is to maintain a finer partition of the state-action space in regions which are frequently visited in historical trajectories, and have higher payoff e… ▽ More We present an efficient algorithm for model-free episodic reinforcement learning on large (potentially continuous) state-action spaces. Our algorithm is based on a novel $Q$-learning policy with adaptive data-driven discretization. The central idea is to maintain a finer partition of the state-action space in regions which are frequently visited in historical trajectories, and have higher payoff estimates. We demonstrate how our adaptive partitions take advantage of the shape of the optimal $Q$-function and the joint space, without sacrificing the worst-case performance. In particular, we recover the regret guarantees of prior algorithms for continuous state-action spaces, which additionally require either an optimal discretization as input, and/or access to a simulation oracle. Moreover, experiments demonstrate how our algorithm automatically adapts to the underlying structure of the problem, resulting in much better performance compared both to heuristics and $Q$-learning with uniform discretization. △ Less

Submitted 31 October, 2019; v1 submitted 17 October, 2019; originally announced October 2019.

Comments: 46 pages, 15 figures

MSC Class: 68Q32 ACM Class: I.2.6

arXiv:1608.02806 [pdf, ps, other]

doi 10.1007/s00285-017-1125-6

Normal and pathological dynamics of platelets in humans

Authors: Gabriel P. Langlois, Morgan Craig, Antony R. Humphries, Michael C. Mackey, Joseph M. Mahaffy, Jacques Bélair, Thibault Moulin, Sean R. Sinclair, Liangliang Wang

Abstract: We develop a comprehensive mathematical model of platelet, megakaryocyte, and thrombopoietin dynamics in humans. We show that there is a single stationary solution that can undergo a Hopf bifurcation, and use this information to investigate both normal and pathological platelet production, specifically cyclic thrombocytopenia. Carefully estimating model parameters from laboratory and clinical data… ▽ More We develop a comprehensive mathematical model of platelet, megakaryocyte, and thrombopoietin dynamics in humans. We show that there is a single stationary solution that can undergo a Hopf bifurcation, and use this information to investigate both normal and pathological platelet production, specifically cyclic thrombocytopenia. Carefully estimating model parameters from laboratory and clinical data, we then argue that a subset of parameters are involved in the genesis of cyclic thrombocytopenia based on clinical information. We provide excellent model fits to the existing data for both platelet counts and thrombopoietin levels by changing six parameters that have physiological correlates. Our results indicate that the primary change in cyclic thrombocytopenia is a major interference with or destruction of the thrombopoietin receptor with secondary changes in other processes, including immune-mediated destruction of platelets and megakaryocyte deficiency and failure in platelet production. This study makes a major contribution to the understanding of the origin of cyclic thrombopoietin as well as significantly extending the modeling of thrombopoiesis. △ Less

Submitted 26 January, 2017; v1 submitted 29 July, 2016; originally announced August 2016.

MSC Class: 37N25; 92B99; 92C30; 37G15

Journal ref: Journal of Mathematical Biology volume 75, pages 1411-1462 (2017)

Showing 1–12 of 12 results for author: Sinclair, S R