Skip to main content

Showing 1–47 of 47 results for author: Simchi-Levi, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.07852  [pdf, ps, other

    cs.LG stat.ML

    Pre-Trained AI Model Assisted Online Decision-Making under Missing Covariates: A Theoretical Perspective

    Authors: Haichen Hu, David Simchi-Levi

    Abstract: We study a sequential contextual decision-making problem in which certain covariates are missing but can be imputed using a pre-trained AI model. From a theoretical perspective, we analyze how the presence of such a model influences the regret of the decision-making process. We introduce a novel notion called "model elasticity", which quantifies the sensitivity of the reward function to the discre… ▽ More

    Submitted 10 July, 2025; originally announced July 2025.

  2. arXiv:2505.07101  [pdf, other

    stat.ML cs.LG

    Constrained Online Decision-Making: A Unified Framework

    Authors: Haichen Hu, David Simchi-Levi, Navid Azizan

    Abstract: Contextual online decision-making problems with constraints appear in a wide range of real-world applications, such as adaptive experimental design under safety constraints, personalized recommendation with resource limits, and dynamic pricing under fairness requirements. In this paper, we investigate a general formulation of sequential decision-making with stage-wise feasibility constraints, wher… ▽ More

    Submitted 22 May, 2025; v1 submitted 11 May, 2025; originally announced May 2025.

  3. arXiv:2504.11320  [pdf, other

    cs.LG cs.AI cs.DC math.OC stat.ML

    Optimizing LLM Inference: Fluid-Guided Online Scheduling with Memory Constraints

    Authors: Ruicheng Ao, Gan Luo, David Simchi-Levi, Xinshang Wang

    Abstract: Large Language Models (LLMs) are indispensable in today's applications, but their inference procedure -- generating responses by processing text in segments and using a memory-heavy Key-Value (KV) cache -- demands significant computational resources, particularly under memory constraints. This paper formulates LLM inference optimization as a multi-stage online scheduling problem where sequential p… ▽ More

    Submitted 15 April, 2025; originally announced April 2025.

    Comments: 42 pages, 18 figures

  4. arXiv:2502.13115  [pdf, ps, other

    cs.LG cs.AI cs.CR math.ST stat.ML

    Near-Optimal Private Learning in Linear Contextual Bandits

    Authors: Fan Chen, Jiachun Li, Alexander Rakhlin, David Simchi-Levi

    Abstract: We analyze the problem of private learning in generalized linear contextual bandits. Our approach is based on a novel method of re-weighted regression, yielding an efficient algorithm with regret of order $\sqrt{T}+\frac{1}α$ and $\sqrt{T}/α$ in the joint and local model of $α$-privacy, respectively. Further, we provide near-optimal private procedures that achieve dimension-independent rates in pr… ▽ More

    Submitted 18 February, 2025; originally announced February 2025.

  5. arXiv:2501.18359  [pdf, other

    stat.ML cs.LG

    Contextual Online Decision Making with Infinite-Dimensional Functional Regression

    Authors: Haichen Hu, Rui Ai, Stephen Bates, David Simchi-Levi

    Abstract: Contextual sequential decision-making problems play a crucial role in machine learning, encompassing a wide range of downstream applications such as bandits, sequential hypothesis testing and online risk control. These applications often require different statistical measures, including expectation, variance and quantiles. In this paper, we provide a universal admissible algorithm framework for de… ▽ More

    Submitted 30 January, 2025; originally announced January 2025.

    Comments: 30 pages

  6. arXiv:2501.14155  [pdf, other

    math.OC cs.LG

    Learning to Price with Resource Constraints: From Full Information to Machine-Learned Prices

    Authors: Ruicheng Ao, Jiashuo Jiang, David Simchi-Levi

    Abstract: We study the dynamic pricing problem with knapsack, addressing the challenge of balancing exploration and exploitation under resource constraints. We introduce three algorithms tailored to different informational settings: a Boundary Attracted Re-solve Method for full information, an online learning algorithm for scenarios with no prior information, and an estimate-then-select re-solve algorithm t… ▽ More

    Submitted 23 January, 2025; originally announced January 2025.

    Comments: 28 pages, 4 figures

  7. arXiv:2411.12036  [pdf, other

    stat.ML cs.LG econ.EM

    Prediction-Guided Active Experiments

    Authors: Ruicheng Ao, Hongyu Chen, David Simchi-Levi

    Abstract: In this work, we introduce a new framework for active experimentation, the Prediction-Guided Active Experiment (PGAE), which leverages predictions from an existing machine learning model to guide sampling and experimentation. Specifically, at each time step, an experimental unit is sampled according to a designated sampling distribution, and the actual outcome is observed based on an experimental… ▽ More

    Submitted 20 November, 2024; v1 submitted 18 November, 2024; originally announced November 2024.

    Comments: 25 pages, 11 figures

  8. arXiv:2410.05552  [pdf, ps, other

    stat.ML cs.LG

    Optimal Adaptive Experimental Design for Estimating Treatment Effect

    Authors: Jiachun Li, David Simchi-Levi, Yunxiao Zhao

    Abstract: Given n experiment subjects with potentially heterogeneous covariates and two possible treatments, namely active treatment and control, this paper addresses the fundamental question of determining the optimal accuracy in estimating the treatment effect. Furthermore, we propose an experimental design that approaches this optimal accuracy, giving a (non-asymptotic) answer to this fundamental yet sti… ▽ More

    Submitted 11 November, 2024; v1 submitted 7 October, 2024; originally announced October 2024.

    Comments: Delete unrelated figure, update new lower bound results

  9. arXiv:2407.19618  [pdf, other

    stat.ME cs.LG econ.EM stat.AP stat.ML

    Experimenting on Markov Decision Processes with Local Treatments

    Authors: Shuze Chen, David Simchi-Levi, Chonghuan Wang

    Abstract: Utilizing randomized experiments to evaluate the effect of short-term treatments on the short-term outcomes has been well understood and become the golden standard in industrial practice. However, as service systems become increasingly dynamical and personalized, much focus is shifting toward maximizing long-term cumulative outcomes, such as customer lifetime value, through lifetime exposure to in… ▽ More

    Submitted 17 October, 2024; v1 submitted 28 July, 2024; originally announced July 2024.

  10. arXiv:2405.17796  [pdf, ps, other

    cs.LG stat.ML

    Offline Oracle-Efficient Learning for Contextual MDPs via Layerwise Exploration-Exploitation Tradeoff

    Authors: Jian Qian, Haichen Hu, David Simchi-Levi

    Abstract: Motivated by the recent discovery of a statistical and computational reduction from contextual bandits to offline regression (Simchi-Levi and Xu, 2021), we address the general (stochastic) Contextual Markov Decision Process (CMDP) problem with horizon H (as known as CMDP with H layers). In this paper, we introduce a reduction from CMDPs to offline density estimation under the realizability assumpt… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  11. arXiv:2404.09413  [pdf, other

    stat.ML cs.CR cs.LG

    On the Optimal Regret of Locally Private Linear Contextual Bandit

    Authors: Jiachun Li, David Simchi-Levi, Yining Wang

    Abstract: Contextual bandit with linear reward functions is among one of the most extensively studied models in bandit and online learning research. Recently, there has been increasing interest in designing \emph{locally private} linear contextual bandit algorithms, where sensitive information contained in contexts and rewards is protected against leakage to the general public. While the classical linear co… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

  12. arXiv:2402.11425  [pdf, other

    stat.ME cs.LG math.OC math.PR

    Bayesian Online Multiple Testing: A Resource Allocation Approach

    Authors: Ruicheng Ao, Hongyu Chen, David Simchi-Levi, Feng Zhu

    Abstract: We consider the problem of sequentially conducting multiple experiments where each experiment corresponds to a hypothesis testing task. At each time point, the experimenter must make an irrevocable decision of whether to reject the null hypothesis (or equivalently claim a discovery) before the next experimental result arrives. The goal is to maximize the number of discoveries while maintaining a l… ▽ More

    Submitted 15 July, 2024; v1 submitted 17 February, 2024; originally announced February 2024.

  13. arXiv:2401.08224  [pdf, other

    stat.ME cs.CR cs.LG

    Privacy Preserving Adaptive Experiment Design

    Authors: Jiachun Li, Kaining Shi, David Simchi-Levi

    Abstract: Adaptive experiment is widely adopted to estimate conditional average treatment effect (CATE) in clinical trials and many other scenarios. While the primary goal in experiment is to maximize estimation accuracy, due to the imperative of social welfare, it's also crucial to provide treatment with superior outcomes to patients, which is measured by regret in contextual bandit framework. These two ob… ▽ More

    Submitted 5 February, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

    Comments: Add a table

  14. arXiv:2311.16528  [pdf, other

    stat.ML cs.LG

    Utility Fairness in Contextual Dynamic Pricing with Demand Learning

    Authors: Xi Chen, David Simchi-Levi, Yining Wang

    Abstract: This paper introduces a novel contextual bandit algorithm for personalized pricing under utility fairness constraints in scenarios with uncertain demand, achieving an optimal regret upper bound. Our approach, which incorporates dynamic pricing and demand learning, addresses the critical challenge of fairness in pricing strategies. We first delve into the static full-information setting to formulat… ▽ More

    Submitted 28 November, 2023; originally announced November 2023.

  15. arXiv:2304.04341  [pdf, ps, other

    stat.ML cs.LG math.ST stat.ME

    Regret Distribution in Stochastic Bandits: Optimal Trade-off between Expectation and Tail Risk

    Authors: David Simchi-Levi, Zeyu Zheng, Feng Zhu

    Abstract: We study the trade-off between expectation and tail risk for regret distribution in the stochastic multi-armed bandit problem. We fully characterize the interplay among three desired properties for policy design: worst-case optimality, instance-dependent consistency, and light-tailed risk. We show how the order of expected regret exactly affects the decaying rate of the regret tail probability for… ▽ More

    Submitted 9 April, 2023; originally announced April 2023.

    Comments: arXiv admin note: text overlap with arXiv:2206.02969

  16. arXiv:2209.13099  [pdf, other

    cs.GT

    Bayesian Mechanism Design for Blockchain Transaction Fee Allocation

    Authors: Xi Chen, David Simchi-Levi, Zishuo Zhao, Yuan Zhou

    Abstract: In blockchain systems, the design of transaction fee mechanisms is essential for stability and satisfaction for both miners and users. A recent work has proven the impossibility of collusion-proof mechanisms that achieve both non-zero miner revenue and Dominating-Strategy-Incentive-Compatible (DSIC) for users. However, a positive miner revenue is important in practice to motivate miners. To addres… ▽ More

    Submitted 23 December, 2024; v1 submitted 26 September, 2022; originally announced September 2022.

    Comments: 71 pages, Operations Research (2025)

  17. arXiv:2206.02969  [pdf, other

    stat.ML cs.LG math.ST

    A Simple and Optimal Policy Design with Safety against Heavy-Tailed Risk for Stochastic Bandits

    Authors: David Simchi-Levi, Zeyu Zheng, Feng Zhu

    Abstract: We study the stochastic multi-armed bandit problem and design new policies that enjoy both worst-case optimality for expected regret and light-tailed risk for regret distribution. Specifically, our policy design (i) enjoys the worst-case optimality for the expected regret at order $O(\sqrt{KT\ln T})$ and (ii) has the worst-case tail probability of incurring a regret larger than any $x>0$ being upp… ▽ More

    Submitted 22 July, 2024; v1 submitted 6 June, 2022; originally announced June 2022.

    Comments: Preliminary version appeared in NeurIPS 2022

  18. arXiv:2111.10919  [pdf, other

    cs.LG stat.ML

    Offline Reinforcement Learning: Fundamental Barriers for Value Function Approximation

    Authors: Dylan J. Foster, Akshay Krishnamurthy, David Simchi-Levi, Yunzong Xu

    Abstract: We consider the offline reinforcement learning problem, where the aim is to learn a decision making policy from logged data. Offline RL -- particularly when coupled with (value) function approximation to allow for generalization in large or continuous state spaces -- is becoming increasingly relevant in practice, because it avoids costly and time-consuming online data collection and is well suited… ▽ More

    Submitted 30 August, 2022; v1 submitted 21 November, 2021; originally announced November 2021.

    Comments: Accepted for presentation at the Conference on Learning Theory (COLT) 2022

  19. arXiv:2111.00790  [pdf, ps, other

    stat.ML cs.LG

    Dynamic Pricing and Demand Learning on a Large Network of Products: A PAC-Bayesian Approach

    Authors: N. Bora Keskin, David Simchi-Levi, Prem Talwai

    Abstract: We consider a seller offering a large network of $N$ products over a time horizon of $T$ periods. The seller does not know the parameters of the products' linear demand model, and can dynamically adjust product prices to learn the demand model based on sales observations. The seller aims to minimize its pseudo-regret, i.e., the expected revenue loss relative to a clairvoyant who knows the underlyi… ▽ More

    Submitted 18 December, 2021; v1 submitted 1 November, 2021; originally announced November 2021.

  20. arXiv:2106.14813  [pdf, other

    stat.ML cs.DM cs.LG math.OC

    Offline Planning and Online Learning under Recovering Rewards

    Authors: David Simchi-Levi, Zeyu Zheng, Feng Zhu

    Abstract: Motivated by emerging applications such as live-streaming e-commerce, promotions and recommendations, we introduce and solve a general class of non-stationary multi-armed bandit problems that have the following two features: (i) the decision maker can pull and collect rewards from up to $K\,(\ge 1)$ out of $N$ different arms in each time period; (ii) the expected reward of an arm immediately drops… ▽ More

    Submitted 21 December, 2021; v1 submitted 28 June, 2021; originally announced June 2021.

    Comments: v1 accepted by ICML 2021

  21. arXiv:2105.07446  [pdf, ps, other

    stat.ML cs.LG

    Sobolev Norm Learning Rates for Conditional Mean Embeddings

    Authors: Prem Talwai, Ali Shameli, David Simchi-Levi

    Abstract: We develop novel learning rates for conditional mean embeddings by applying the theory of interpolation for reproducing kernel Hilbert spaces (RKHS). We derive explicit, adaptive convergence rates for the sample estimator under the misspecifed setting, where the target operator is not Hilbert-Schmidt or bounded with respect to the input/output RKHSs. We demonstrate that in certain parameter regime… ▽ More

    Submitted 24 February, 2022; v1 submitted 16 May, 2021; originally announced May 2021.

    Comments: Appears in AISTATS 2022

  22. arXiv:2010.03161  [pdf, other

    cs.LG cs.AI stat.ML

    Model-Free Non-Stationary RL: Near-Optimal Regret and Applications in Multi-Agent RL and Inventory Control

    Authors: Weichao Mao, Kaiqing Zhang, Ruihao Zhu, David Simchi-Levi, Tamer Başar

    Abstract: We consider model-free reinforcement learning (RL) in non-stationary Markov decision processes. Both the reward functions and the state transition functions are allowed to vary arbitrarily over time as long as their cumulative variations do not exceed certain variation budgets. We propose Restarted Q-Learning with Upper Confidence Bounds (RestartQ-UCB), the first model-free algorithm for non-stati… ▽ More

    Submitted 19 August, 2022; v1 submitted 7 October, 2020; originally announced October 2020.

    Comments: A preliminary version of this work has appeared in ICML 2021

  23. arXiv:2010.03104  [pdf, other

    cs.LG math.ST stat.ML

    Instance-Dependent Complexity of Contextual Bandits and Reinforcement Learning: A Disagreement-Based Perspective

    Authors: Dylan J. Foster, Alexander Rakhlin, David Simchi-Levi, Yunzong Xu

    Abstract: In the classical multi-armed bandit problem, instance-dependent algorithms attain improved performance on "easy" problems with a gap between the best and second-best arm. Are similar guarantees possible for contextual bandits? While positive results are known for certain special cases, there is no general theory characterizing when and how instance-dependent regret bounds for contextual bandits ca… ▽ More

    Submitted 6 October, 2020; originally announced October 2020.

  24. arXiv:2009.12920  [pdf, other

    cs.CR cs.GT cs.LG stat.ML

    Privacy-Preserving Dynamic Personalized Pricing with Demand Learning

    Authors: Xi Chen, David Simchi-Levi, Yining Wang

    Abstract: The prevalence of e-commerce has made detailed customers' personal information readily accessible to retailers, and this information has been widely used in pricing decisions. When involving personalized information, how to protect the privacy of such information becomes a critical issue in practice. In this paper, we consider a dynamic pricing problem over $T$ time periods with an \emph{unknown}… ▽ More

    Submitted 25 July, 2021; v1 submitted 27 September, 2020; originally announced September 2020.

    Comments: Final version. Accepted to Management Science

  25. arXiv:2007.00080  [pdf, ps, other

    cs.LG stat.ML

    Provably More Efficient Q-Learning in the One-Sided-Feedback/Full-Feedback Settings

    Authors: Xiao-Yue Gong, David Simchi-Levi

    Abstract: Motivated by the episodic version of the classical inventory control problem, we propose a new Q-learning-based algorithm, Elimination-Based Half-Q-Learning (HQL), that enjoys improved efficiency over existing algorithms for a wide variety of problems in the one-sided-feedback setting. We also provide a simpler variant of the algorithm, Full-Q-Learning (FQL), for the full-feedback setting. We esta… ▽ More

    Submitted 2 October, 2020; v1 submitted 30 June, 2020; originally announced July 2020.

  26. arXiv:2006.14389  [pdf, other

    cs.LG stat.ML

    Reinforcement Learning for Non-Stationary Markov Decision Processes: The Blessing of (More) Optimism

    Authors: Wang Chi Cheung, David Simchi-Levi, Ruihao Zhu

    Abstract: We consider un-discounted reinforcement learning (RL) in Markov decision processes (MDPs) under drifting non-stationarity, i.e., both the reward and state transition distributions are allowed to evolve over time, as long as their respective total variations, quantified by suitable metrics, do not exceed certain variation budgets. We first develop the Sliding Window Upper-Confidence bound for Reinf… ▽ More

    Submitted 24 June, 2020; originally announced June 2020.

    Comments: To appear in proceedings of the 37th International Conference on Machine Learning. Shortened conference version of its journal version (available at: arXiv:1906.02922)

  27. arXiv:2005.00947  [pdf, other

    cs.DS cs.LG math.OC stat.AP

    Online Learning and Optimization for Revenue Management Problems with Add-on Discounts

    Authors: David Simchi-Levi, Rui Sun, Huanan Zhang

    Abstract: We study in this paper a revenue management problem with add-on discounts. The problem is motivated by the practice in the video game industry, where a retailer offers discounts on selected supportive products (e.g. video games) to customers who have also purchased the core products (e.g. video game consoles). We formulate this problem as an optimization problem to determine the prices of differen… ▽ More

    Submitted 2 May, 2020; originally announced May 2020.

  28. arXiv:2003.12699  [pdf, other

    cs.LG math.ST stat.ML

    Bypassing the Monster: A Faster and Simpler Optimal Algorithm for Contextual Bandits under Realizability

    Authors: David Simchi-Levi, Yunzong Xu

    Abstract: We consider the general (stochastic) contextual bandit problem under the realizability assumption, i.e., the expected reward, as a function of contexts and actions, belongs to a general function class $\mathcal{F}$. We design a fast and simple algorithm that achieves the statistically optimal regret with only ${O}(\log T)$ calls to an offline regression oracle across all $T$ rounds. The number of… ▽ More

    Submitted 10 July, 2021; v1 submitted 28 March, 2020; originally announced March 2020.

    Comments: Forthcoming in Mathematics of Operations Research

  29. arXiv:1911.01067  [pdf, other

    cs.LG cs.GT math.OC stat.ML

    Blind Network Revenue Management and Bandits with Knapsacks under Limited Switches

    Authors: David Simchi-Levi, Yunzong Xu, Jinglong Zhao

    Abstract: Our work is motivated by a common business constraint in online markets. While firms respect the advantages of dynamic pricing and price experimentation, they must limit the number of price changes (i.e., switches) to be within some budget due to various practical reasons. We study both the classical price-based network revenue management problem in the distributionally-unknown setup, and the band… ▽ More

    Submitted 1 December, 2020; v1 submitted 4 November, 2019; originally announced November 2019.

  30. arXiv:1910.08693  [pdf, other

    cs.LG stat.ML

    Online Pricing with Offline Data: Phase Transition and Inverse Square Law

    Authors: Jinzhi Bu, David Simchi-Levi, Yunzong Xu

    Abstract: This paper investigates the impact of pre-existing offline data on online learning, in the context of dynamic pricing. We study a single-product dynamic pricing problem over a selling horizon of $T$ periods. The demand in each period is determined by the price of the product according to a linear demand model with unknown parameters. We assume that before the start of the selling horizon, the sell… ▽ More

    Submitted 16 November, 2021; v1 submitted 18 October, 2019; originally announced October 2019.

    Comments: Forthcoming in Management Science

  31. arXiv:1908.09808  [pdf, other

    cs.DS math.OC

    Multi-stage and Multi-customer Assortment Optimization with Inventory Constraints

    Authors: Elaheh Fata, Will Ma, David Simchi-Levi

    Abstract: We consider an assortment optimization problem where a customer chooses a single item from a sequence of sets shown to her, while limited inventories constrain the items offered to customers over time. In the special case where all of the assortments have size one, our problem captures the online stochastic matching with timeouts problem. For this problem, we derive a polynomial-time approximation… ▽ More

    Submitted 26 July, 2020; v1 submitted 26 August, 2019; originally announced August 2019.

  32. arXiv:1907.08735  [pdf, other

    cs.DS

    The Competitive Ratio of Threshold Policies for Online Unit-density Knapsack Problems

    Authors: Will Ma, David Simchi-Levi, Jinglong Zhao

    Abstract: We study a wholesale supply chain ordering problem. In this problem, the supplier has an initial stock, and faces an unpredictable stream of incoming orders, making real-time decisions on whether to accept or reject each order. What makes this wholesale supply chain ordering problem special is its ``knapsack constraint,'' that is, we do not allow partially accepting an order or splitting an order.… ▽ More

    Submitted 6 April, 2025; v1 submitted 19 July, 2019; originally announced July 2019.

  33. arXiv:1906.02922  [pdf, other

    cs.LG stat.ML

    Non-Stationary Reinforcement Learning: The Blessing of (More) Optimism

    Authors: Wang Chi Cheung, David Simchi-Levi, Ruihao Zhu

    Abstract: We consider un-discounted reinforcement learning (RL) in Markov decision processes (MDPs) under temporal drifts, ie, both the reward and state transition distributions are allowed to evolve over time, as long as their respective total variations, quantified by suitable metrics, do not exceed certain variation budgets. This setting captures the endogeneity, exogeneity, uncertainty, and partial feed… ▽ More

    Submitted 18 May, 2020; v1 submitted 7 June, 2019; originally announced June 2019.

  34. arXiv:1905.10825  [pdf, ps, other

    cs.LG cs.DS math.OC stat.ML

    Phase Transitions in Bandits with Switching Constraints

    Authors: David Simchi-Levi, Yunzong Xu

    Abstract: We consider the classical stochastic multi-armed bandit problem with a constraint that limits the total cost incurred by switching between actions to be no larger than a given switching budget. For this problem, we prove matching upper and lower bounds on the optimal (i.e., minimax) regret, and provide efficient rate-optimal algorithms. Surprisingly, the optimal regret of this problem exhibits a n… ▽ More

    Submitted 18 March, 2021; v1 submitted 26 May, 2019; originally announced May 2019.

    Comments: An enhanced version. Many new results are obtained. The presentation is improved

  35. arXiv:1905.04770  [pdf, other

    cs.DS

    Algorithms for Online Matching, Assortment, and Pricing with Tight Weight-dependent Competitive Ratios

    Authors: Will Ma, David Simchi-Levi

    Abstract: Motivated by the dynamic assortment offerings and item pricings occurring in e-commerce, we study a general problem of allocating finite inventories to heterogeneous customers arriving sequentially. We analyze this problem under the framework of competitive analysis, where the sequence of customers is unknown and does not necessarily follow any pattern. Previous work in this area, studying online… ▽ More

    Submitted 12 May, 2019; originally announced May 2019.

  36. arXiv:1903.07844  [pdf, other

    math.OC cs.LG

    Shrinking the Upper Confidence Bound: A Dynamic Product Selection Problem for Urban Warehouses

    Authors: Rong Jin, David Simchi-Levi, Li Wang, Xinshang Wang, Sen Yang

    Abstract: The recent rising popularity of ultra-fast delivery services on retail platforms fuels the increasing use of urban warehouses, whose proximity to customers makes fast deliveries viable. The space limit in urban warehouses poses a problem for the online retailers: the number of products (SKUs) they carry is no longer "the more, the better", yet it can still be significantly large, reaching hundreds… ▽ More

    Submitted 2 May, 2019; v1 submitted 19 March, 2019; originally announced March 2019.

  37. arXiv:1903.01461  [pdf, other

    cs.LG stat.ML

    Hedging the Drift: Learning to Optimize under Non-Stationarity

    Authors: Wang Chi Cheung, David Simchi-Levi, Ruihao Zhu

    Abstract: We introduce data-driven decision-making algorithms that achieve state-of-the-art \emph{dynamic regret} bounds for non-stationary bandit settings. These settings capture applications such as advertisement allocation, dynamic pricing, and traffic network routing in changing environments. We show how the difficulty posed by the (unknown \emph{a priori} and possibly adversarial) non-stationarity can… ▽ More

    Submitted 17 March, 2021; v1 submitted 4 March, 2019; originally announced March 2019.

    Comments: Journal version of the AISTATS 2019 version (available at arXiv:1810.03024). This version fixed an error in the proof of Theorem 2 with Assumption 4 of arXiv:2103.05750

  38. arXiv:1902.10918  [pdf, other

    cs.LG stat.ML

    Meta Dynamic Pricing: Transfer Learning Across Experiments

    Authors: Hamsa Bastani, David Simchi-Levi, Ruihao Zhu

    Abstract: We study the problem of learning shared structure \emph{across} a sequence of dynamic pricing experiments for related products. We consider a practical formulation where the unknown demand parameters for each product come from an unknown distribution (prior) that is shared across products. We then propose a meta dynamic pricing algorithm that learns this prior online while solving a sequence of Th… ▽ More

    Submitted 5 January, 2021; v1 submitted 28 February, 2019; originally announced February 2019.

  39. arXiv:1901.02871  [pdf, other

    math.OC cs.DS cs.LG stat.ML

    The Lingering of Gradients: Theory and Applications

    Authors: Zeyuan Allen-Zhu, David Simchi-Levi, Xinshang Wang

    Abstract: Classically, the time complexity of a first-order method is estimated by its number of gradient computations. In this paper, we study a more refined complexity by taking into account the `lingering' of gradients: once a gradient is computed at $x_k$, the additional time to compute gradients at $x_{k+1},x_{k+2},\dots$ may be reduced. We show how this improves the running time of several first-ord… ▽ More

    Submitted 28 May, 2019; v1 submitted 9 January, 2019; originally announced January 2019.

  40. arXiv:1811.01077  [pdf, other

    cs.DS

    Dynamic Pricing (and Assortment) under a Static Calendar

    Authors: Will Ma, David Simchi-Levi, Jinglong Zhao

    Abstract: This work is motivated by our collaboration with a large consumer packaged goods (CPG) company. We have found that while the company appreciates the advantages of dynamic pricing, they deem it operationally much easier to plan out a static price calendar in advance. We investigate the efficacy of static control policies for revenue management problems whose optimal solution is inherently dynamic… ▽ More

    Submitted 21 November, 2020; v1 submitted 2 November, 2018; originally announced November 2018.

  41. arXiv:1810.10900  [pdf, other

    cs.DS cs.GT

    On Policies for Single-leg Revenue Management with Limited Demand Information

    Authors: Will Ma, David Simchi-Levi, Chung-Piaw Teo

    Abstract: In this paper we study the single-item revenue management problem, with no information given about the demand trajectory over time. When the item is sold through accepting/rejecting different fare classes, Ball and Queyranne (2009) have established the tight competitive ratio for this problem using booking limit policies, which raise the acceptance threshold as the remaining inventory dwindles. Ho… ▽ More

    Submitted 17 January, 2020; v1 submitted 25 October, 2018; originally announced October 2018.

  42. arXiv:1810.05640  [pdf, other

    cs.AI cs.LG stat.ML

    Inventory Balancing with Online Learning

    Authors: Wang Chi Cheung, Will Ma, David Simchi-Levi, Xinshang Wang

    Abstract: We study a general problem of allocating limited resources to heterogeneous customers over time under model uncertainty. Each type of customer can be serviced using different actions, each of which stochastically consumes some combination of resources, and returns different rewards for the resources consumed. We consider a general model where the resource consumption distribution associated with e… ▽ More

    Submitted 30 August, 2021; v1 submitted 11 October, 2018; originally announced October 2018.

  43. arXiv:1810.03024  [pdf, other

    cs.LG stat.ML

    Learning to Optimize under Non-Stationarity

    Authors: Wang Chi Cheung, David Simchi-Levi, Ruihao Zhu

    Abstract: We introduce algorithms that achieve state-of-the-art \emph{dynamic regret} bounds for non-stationary linear stochastic bandit setting. It captures natural applications such as dynamic pricing and ads allocation in a changing environment. We show how the difficulty posed by the non-stationarity can be overcome by a novel marriage between stochastic and adversarial bandits learning algorithms. Defi… ▽ More

    Submitted 17 July, 2021; v1 submitted 6 October, 2018; originally announced October 2018.

    Comments: This version fixed an error in the proof of Lemma 1 with Assumption 4 of arXiv:2103.05750

    Journal ref: Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS 2019)

  44. arXiv:1709.03683  [pdf, other

    cs.LG cs.AI stat.ML

    A Practically Competitive and Provably Consistent Algorithm for Uplift Modeling

    Authors: Yan Zhao, Xiao Fang, David Simchi-Levi

    Abstract: Randomized experiments have been critical tools of decision making for decades. However, subjects can show significant heterogeneity in response to treatments in many important applications. Therefore it is not enough to simply know which treatment is optimal for the entire population. What we need is a model that correctly customize treatment assignment base on subject characteristics. The proble… ▽ More

    Submitted 11 September, 2017; originally announced September 2017.

    Comments: Accepted by 2017 IEEE International Conference on Data Mining

  45. arXiv:1705.08492  [pdf, other

    cs.AI

    Uplift Modeling with Multiple Treatments and General Response Types

    Authors: Yan Zhao, Xiao Fang, David Simchi-Levi

    Abstract: Randomized experiments have been used to assist decision-making in many areas. They help people select the optimal treatment for the test population with certain statistical guarantee. However, subjects can show significant heterogeneity in response to treatments. The problem of customizing treatment assignment based on subject characteristics is known as uplift modeling, differential response ana… ▽ More

    Submitted 23 May, 2017; originally announced May 2017.

  46. arXiv:1704.00108  [pdf, ps, other

    cs.LG

    Assortment Optimization under Unknown MultiNomial Logit Choice Models

    Authors: Wang Chi Cheung, David Simchi-Levi

    Abstract: Motivated by e-commerce, we study the online assortment optimization problem. The seller offers an assortment, i.e. a subset of products, to each arriving customer, who then purchases one or no product from her offered assortment. A customer's purchase decision is governed by the underlying MultiNomial Logit (MNL) choice model. The seller aims to maximize the total revenue in a finite sales horizo… ▽ More

    Submitted 31 March, 2017; originally announced April 2017.

    Comments: 16 pages, 2 figures

  47. arXiv:1512.02300  [pdf, ps, other

    cs.GT

    Reaping the Benefits of Bundling under High Production Costs

    Authors: Will Ma, David Simchi-Levi

    Abstract: It is well-known that selling different goods in a single bundle can significantly increase revenue. However, bundling is no longer profitable if the goods have high production costs. To overcome this challenge, we introduce a new mechanism, Pure Bundling with Disposal for Cost (PBDC), where after buying the bundle, the customer is allowed to return any subset of goods for their costs. We provid… ▽ More

    Submitted 20 February, 2021; v1 submitted 7 December, 2015; originally announced December 2015.