Skip to main content

Showing 1–50 of 89 results for author: Foster, D

Searching in archive stat. Search in all archives.
.
  1. arXiv:2503.21878  [pdf, other

    cs.AI cs.LG stat.ML

    Is Best-of-N the Best of Them? Coverage, Scaling, and Optimality in Inference-Time Alignment

    Authors: Audrey Huang, Adam Block, Qinghua Liu, Nan Jiang, Akshay Krishnamurthy, Dylan J. Foster

    Abstract: Inference-time computation offers a powerful axis for scaling the performance of language models. However, naively increasing computation in techniques like Best-of-N sampling can lead to performance degradation due to reward hacking. Toward a theoretical understanding of how to best leverage additional computation, we focus on inference-time alignment, which we formalize as the problem of improvi… ▽ More

    Submitted 7 April, 2025; v1 submitted 27 March, 2025; originally announced March 2025.

  2. arXiv:2412.01951  [pdf, other

    cs.AI cs.CL cs.LG stat.ML

    Self-Improvement in Language Models: The Sharpening Mechanism

    Authors: Audrey Huang, Adam Block, Dylan J. Foster, Dhruv Rohatgi, Cyril Zhang, Max Simchowitz, Jordan T. Ash, Akshay Krishnamurthy

    Abstract: Recent work in language modeling has raised the possibility of self-improvement, where a language models evaluates and refines its own generations to achieve higher performance without external feedback. It is impossible for this self-improvement to create information that is not already in the model, so why should we expect that this will lead to improved capabilities? We offer a new perspective… ▽ More

    Submitted 4 December, 2024; v1 submitted 2 December, 2024; originally announced December 2024.

  3. arXiv:2411.03955  [pdf, ps, other

    math.PR cs.GT stat.OT

    Large Deviations Inequalities for Unequal Probability Sampling Without Replacement

    Authors: Dean P. Foster, Sergiu Hart

    Abstract: We provide bounds on the tail probabilities for simple procedures that generate random samples _without replacement_, when the probabilities of being selected need not be equal.

    Submitted 6 November, 2024; originally announced November 2024.

  4. arXiv:2410.21676  [pdf, other

    cs.LG cs.AI math.OC stat.ML

    How Does Critical Batch Size Scale in Pre-training?

    Authors: Hanlin Zhang, Depen Morwani, Nikhil Vyas, Jingfeng Wu, Difan Zou, Udaya Ghai, Dean Foster, Sham Kakade

    Abstract: Training large-scale models under given resources requires careful design of parallelism strategies. In particular, the efficiency notion of critical batch size (CBS), concerning the compromise between time and compute, marks the threshold beyond which greater data parallelism leads to diminishing returns. To operationalize it, we propose a measure of CBS and pre-train a series of auto-regressive… ▽ More

    Submitted 21 April, 2025; v1 submitted 28 October, 2024; originally announced October 2024.

    Comments: ICLR 2025, Blog post: https://kempnerinstitute.harvard.edu/research/deeper-learning/how-does-critical-batch-size-scale-in-pre-training-decoupling-data-and-model-size

  5. arXiv:2410.17904  [pdf, other

    cs.LG cs.AI math.OC stat.ML

    Reinforcement Learning under Latent Dynamics: Toward Statistical and Algorithmic Modularity

    Authors: Philip Amortila, Dylan J. Foster, Nan Jiang, Akshay Krishnamurthy, Zakaria Mhammedi

    Abstract: Real-world applications of reinforcement learning often involve environments where agents operate on complex, high-dimensional observations, but the underlying (''latent'') dynamics are comparatively simple. However, outside of restrictive settings such as small latent spaces, the fundamental statistical requirements and algorithmic principles for reinforcement learning under latent dynamics are p… ▽ More

    Submitted 23 October, 2024; originally announced October 2024.

  6. arXiv:2410.05117  [pdf, ps, other

    cs.LG cs.IT math.ST stat.ML

    Assouad, Fano, and Le Cam with Interaction: A Unifying Lower Bound Framework and Characterization for Bandit Learnability

    Authors: Fan Chen, Dylan J. Foster, Yanjun Han, Jian Qian, Alexander Rakhlin, Yunbei Xu

    Abstract: We develop a unifying framework for information-theoretic lower bound in statistical estimation and interactive decision making. Classical lower bound techniques -- such as Fano's method, Le Cam's method, and Assouad's lemma -- are central to the study of minimax risk in statistical estimation, yet are insufficient to provide tight lower bounds for \emph{interactive decision making} algorithms tha… ▽ More

    Submitted 6 December, 2024; v1 submitted 7 October, 2024; originally announced October 2024.

  7. arXiv:2410.02817  [pdf, other

    eess.SY cs.LG stat.ML

    Neural Coordination and Capacity Control for Inventory Management

    Authors: Carson Eisenach, Udaya Ghai, Dhruv Madeka, Kari Torkkola, Dean Foster, Sham Kakade

    Abstract: This paper addresses the capacitated periodic review inventory control problem, focusing on a retailer managing multiple products with limited shared resources, such as storage or inbound labor at a facility. Specifically, this paper is motivated by the questions of (1) what does it mean to backtest a capacity control mechanism, (2) can we devise and backtest a capacity control mechanism that is c… ▽ More

    Submitted 24 September, 2024; originally announced October 2024.

  8. arXiv:2407.15007  [pdf, other

    cs.LG cs.AI math.ST stat.ML

    Is Behavior Cloning All You Need? Understanding Horizon in Imitation Learning

    Authors: Dylan J. Foster, Adam Block, Dipendra Misra

    Abstract: Imitation learning (IL) aims to mimic the behavior of an expert in a sequential decision making task by learning from demonstrations, and has been widely applied to robotics, autonomous driving, and autoregressive text generation. The simplest approach to IL, behavior cloning (BC), is thought to incur sample complexity with unfavorable quadratic dependence on the problem horizon, motivating a vari… ▽ More

    Submitted 30 November, 2024; v1 submitted 20 July, 2024; originally announced July 2024.

    Comments: NeurIPS 2024

  9. arXiv:2405.21046  [pdf, other

    cs.LG cs.AI cs.CL stat.ML

    Exploratory Preference Optimization: Harnessing Implicit Q*-Approximation for Sample-Efficient RLHF

    Authors: Tengyang Xie, Dylan J. Foster, Akshay Krishnamurthy, Corby Rosset, Ahmed Awadallah, Alexander Rakhlin

    Abstract: Reinforcement learning from human feedback (RLHF) has emerged as a central tool for language model alignment. We consider online exploration in RLHF, which exploits interactive access to human or AI feedback by deliberately encouraging the model to produce diverse, maximally informative responses. By allowing RLHF to confidently stray from the pre-trained model, online exploration offers the possi… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

  10. arXiv:2404.15417  [pdf, other

    cs.LG cs.AI stat.ML

    The Power of Resets in Online Reinforcement Learning

    Authors: Zakaria Mhammedi, Dylan J. Foster, Alexander Rakhlin

    Abstract: Simulators are a pervasive tool in reinforcement learning, but most existing algorithms cannot efficiently exploit simulator access -- particularly in high-dimensional domains that require general function approximation. We explore the power of simulators through online reinforcement learning with {local simulator access} (or, local planning), an RL protocol where the agent is allowed to reset to… ▽ More

    Submitted 26 April, 2024; v1 submitted 23 April, 2024; originally announced April 2024.

    Comments: Fixed a small typo

  11. arXiv:2404.10122  [pdf, other

    stat.ML cs.LG math.ST

    Online Estimation via Offline Estimation: An Information-Theoretic Framework

    Authors: Dylan J. Foster, Yanjun Han, Jian Qian, Alexander Rakhlin

    Abstract: $… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  12. arXiv:2403.06571  [pdf, other

    cs.LG math.OC stat.ML

    Scalable Online Exploration via Coverability

    Authors: Philip Amortila, Dylan J. Foster, Akshay Krishnamurthy

    Abstract: Exploration is a major challenge in reinforcement learning, especially for high-dimensional domains that require function approximation. We propose exploration objectives -- policy optimization objectives that enable downstream maximization of any reward function -- as a conceptual framework to systematize the study of exploration. Within this framework, we introduce a new objective, $L_1$-Coverag… ▽ More

    Submitted 4 June, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

    Comments: ICML 2024

  13. arXiv:2401.09681  [pdf, other

    cs.LG stat.ML

    Harnessing Density Ratios for Online Reinforcement Learning

    Authors: Philip Amortila, Dylan J. Foster, Nan Jiang, Ayush Sekhari, Tengyang Xie

    Abstract: The theories of offline and online reinforcement learning, despite having evolved in parallel, have begun to show signs of the possibility for a unification, with algorithms and analysis techniques for one setting often having natural counterparts in the other. However, the notion of density ratio modeling, an emerging paradigm in offline RL, has been largely absent from online RL, perhaps for goo… ▽ More

    Submitted 4 June, 2024; v1 submitted 17 January, 2024; originally announced January 2024.

    Comments: ICLR 2024

  14. arXiv:2312.16730  [pdf, other

    cs.LG math.OC math.ST stat.ML

    Foundations of Reinforcement Learning and Interactive Decision Making

    Authors: Dylan J. Foster, Alexander Rakhlin

    Abstract: These lecture notes give a statistical perspective on the foundations of reinforcement learning and interactive decision making. We present a unifying framework for addressing the exploration-exploitation dilemma using frequentist and Bayesian approaches, with connections and parallels between supervised learning/estimation and decision making as an overarching theme. Special attention is paid to… ▽ More

    Submitted 27 December, 2023; originally announced December 2023.

  15. arXiv:2310.17168  [pdf, other

    cs.LG stat.ML

    Learning an Inventory Control Policy with General Inventory Arrival Dynamics

    Authors: Sohrab Andaz, Carson Eisenach, Dhruv Madeka, Kari Torkkola, Randy Jia, Dean Foster, Sham Kakade

    Abstract: In this paper we address the problem of learning and backtesting inventory control policies in the presence of general arrival dynamics -- which we term as a quantity-over-time arrivals model (QOT). We also allow for order quantities to be modified as a post-processing step to meet vendor constraints such as order minimum and batch size constraints -- a common practice in real supply chains. To th… ▽ More

    Submitted 21 January, 2024; v1 submitted 26 October, 2023; originally announced October 2023.

  16. arXiv:2310.16096  [pdf, ps, other

    stat.ML cs.LG

    Contextual Bandits for Evaluating and Improving Inventory Control Policies

    Authors: Dean Foster, Randy Jia, Dhruv Madeka

    Abstract: Solutions to address the periodic review inventory control problem with nonstationary random demand, lost sales, and stochastic vendor lead times typically involve making strong assumptions on the dynamics for either approximation or simulation, and applying methods such as optimization, dynamic programming, or reinforcement learning. Therefore, it is important to analyze and evaluate any inventor… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

  17. arXiv:2310.11428  [pdf, other

    cs.LG math.OC stat.ML

    Butterfly Effects of SGD Noise: Error Amplification in Behavior Cloning and Autoregression

    Authors: Adam Block, Dylan J. Foster, Akshay Krishnamurthy, Max Simchowitz, Cyril Zhang

    Abstract: This work studies training instabilities of behavior cloning with deep neural networks. We observe that minibatch SGD updates to the policy network during training result in sharp oscillations in long-horizon rewards, despite negligibly affecting the behavior cloning loss. We empirically disentangle the statistical and computational causes of these oscillations, and find them to stem from the chao… ▽ More

    Submitted 17 October, 2023; originally announced October 2023.

  18. arXiv:2307.09423  [pdf, other

    cs.LG cs.AI stat.ML

    Scaling Laws for Imitation Learning in Single-Agent Games

    Authors: Jens Tuyls, Dhruv Madeka, Kari Torkkola, Dean Foster, Karthik Narasimhan, Sham Kakade

    Abstract: Imitation Learning (IL) is one of the most widely used methods in machine learning. Yet, many works find it is often unable to fully recover the underlying expert behavior, even in constrained environments like single-agent games. However, none of these works deeply investigate the role of scaling up the model and data size. Inspired by recent work in Natural Language Processing (NLP) where "scali… ▽ More

    Submitted 19 December, 2024; v1 submitted 18 July, 2023; originally announced July 2023.

    Comments: Accepted at TMLR 2024

  19. arXiv:2305.00684  [pdf, other

    cs.LG cs.AI cs.GT cs.MA stat.ML

    On the Complexity of Multi-Agent Decision Making: From Learning in Games to Partial Monitoring

    Authors: Dylan J. Foster, Dean P. Foster, Noah Golowich, Alexander Rakhlin

    Abstract: A central problem in the theory of multi-agent reinforcement learning (MARL) is to understand what structural conditions and algorithmic principles lead to sample-efficient learning guarantees, and how these considerations change as we move from few to many agents. We study this question in a general framework for interactive decision making with multiple agents, encompassing Markov games with fun… ▽ More

    Submitted 1 May, 2023; originally announced May 2023.

    Comments: 95 pages

  20. arXiv:2304.12466  [pdf, other

    cs.LG stat.ML

    Instance-Optimality in Interactive Decision Making: Toward a Non-Asymptotic Theory

    Authors: Andrew Wagenmaker, Dylan J. Foster

    Abstract: We consider the development of adaptive, instance-dependent algorithms for interactive decision making (bandits, reinforcement learning, and beyond) that, rather than only performing well in the worst case, adapt to favorable properties of real-world instances for improved performance. We aim for instance-optimality, a strong notion of adaptivity which asserts that, on any particular problem insta… ▽ More

    Submitted 24 April, 2023; originally announced April 2023.

  21. arXiv:2303.12287  [pdf, ps, other

    cs.LG cs.AI cs.GT stat.ML

    Hardness of Independent Learning and Sparse Equilibrium Computation in Markov Games

    Authors: Dylan J. Foster, Noah Golowich, Sham M. Kakade

    Abstract: We consider the problem of decentralized multi-agent reinforcement learning in Markov games. A fundamental question is whether there exist algorithms that, when adopted by all agents and run independently in a decentralized fashion, lead to no-regret for each player, analogous to celebrated convergence results in normal-form games. While recent work has shown that such algorithms exist for restric… ▽ More

    Submitted 21 March, 2023; originally announced March 2023.

    Comments: 51 pages

  22. arXiv:2301.08215  [pdf, other

    cs.LG math.OC math.ST stat.ML

    Tight Guarantees for Interactive Decision Making with the Decision-Estimation Coefficient

    Authors: Dylan J. Foster, Noah Golowich, Yanjun Han

    Abstract: A foundational problem in reinforcement learning and interactive decision making is to understand what modeling assumptions lead to sample-efficient learning guarantees, and what algorithm design principles achieve optimal sample complexity. Recently, Foster et al. (2021) introduced the Decision-Estimation Coefficient (DEC), a measure of statistical complexity which leads to upper and lower bounds… ▽ More

    Submitted 19 January, 2023; originally announced January 2023.

  23. arXiv:2211.14250  [pdf, other

    cs.LG math.OC math.ST stat.ML

    Model-Free Reinforcement Learning with the Decision-Estimation Coefficient

    Authors: Dylan J. Foster, Noah Golowich, Jian Qian, Alexander Rakhlin, Ayush Sekhari

    Abstract: We consider the problem of interactive decision making, encompassing structured bandits and reinforcement learning with general function approximation. Recently, Foster et al. (2021) introduced the Decision-Estimation Coefficient, a measure of statistical complexity that lower bounds the optimal regret for interactive decision making, as well as a meta-algorithm, Estimation-to-Decisions, which ach… ▽ More

    Submitted 12 August, 2023; v1 submitted 25 November, 2022; originally announced November 2022.

    Comments: V2 changes: Improved writing and added more examples

  24. arXiv:2211.07484  [pdf, ps, other

    cs.LG stat.ML

    Contextual Bandits with Packing and Covering Constraints: A Modular Lagrangian Approach via Regression

    Authors: Aleksandrs Slivkins, Xingyu Zhou, Karthik Abinav Sankararaman, Dylan J. Foster

    Abstract: We consider contextual bandits with linear constraints (CBwLC), a variant of contextual bandits in which the algorithm consumes multiple resources subject to linear constraints on total consumption. This problem generalizes contextual bandits with knapsacks (CBwK), allowing for packing and covering constraints, as well as positive and negative resource consumption. We provide the first algorithm f… ▽ More

    Submitted 26 November, 2024; v1 submitted 14 November, 2022; originally announced November 2022.

    Comments: A preliminary version of this paper, authored by A. Slivkins, K.A. Sankararaman and D.J. Foster, has been published at COLT 2023. The present version (since Jun'24) features an important improvement, due to Xingyu Zhou. The Oct'24 version fixes an inaccuracy in Section 6 when the analysis from Section 4 is invoked

  25. arXiv:2210.07169  [pdf, ps, other

    econ.TH cs.GT cs.LG math.ST stat.ML

    Forecast Hedging and Calibration

    Authors: Dean P. Foster, Sergiu Hart

    Abstract: Calibration means that forecasts and average realized frequencies are close. We develop the concept of forecast hedging, which consists of choosing the forecasts so as to guarantee that the expected track record can only improve. This yields all the calibration results by the same simple basic argument while differentiating between them by the forecast-hedging tools used: deterministic and fixed p… ▽ More

    Submitted 13 October, 2022; originally announced October 2022.

    Comments: http://www.ma.huji.ac.il/hart/publ.html#calib-int

    Report number: HUJI DP-731

    Journal ref: Journal of Political Economy 129, 12 (December 2021), 3447-3490

  26. arXiv:2210.07152  [pdf, ps, other

    econ.TH cs.GT cs.LG math.ST stat.ML

    Smooth Calibration, Leaky Forecasts, Finite Recall, and Nash Dynamics

    Authors: Dean P. Foster, Sergiu Hart

    Abstract: We propose to smooth out the calibration score, which measures how good a forecaster is, by combining nearby forecasts. While regular calibration can be guaranteed only by randomized forecasting procedures, we show that smooth calibration can be guaranteed by deterministic procedures. As a consequence, it does not matter if the forecasts are leaked, i.e., made known in advance: smooth calibration… ▽ More

    Submitted 13 October, 2022; originally announced October 2022.

    Comments: http://www.ma.huji.ac.il/hart/publ.html#calib-eq

    Report number: HUJI DP-692

    Journal ref: Games and Economic Behavior 109 (May 2018), 271-293

  27. arXiv:2210.04157  [pdf, other

    cs.LG cs.AI math.OC stat.ML

    The Role of Coverage in Online Reinforcement Learning

    Authors: Tengyang Xie, Dylan J. Foster, Yu Bai, Nan Jiang, Sham M. Kakade

    Abstract: Coverage conditions -- which assert that the data logging distribution adequately covers the state space -- play a fundamental role in determining the sample complexity of offline reinforcement learning. While such conditions might seem irrelevant to online reinforcement learning at first glance, we establish a new connection by showing -- somewhat surprisingly -- that the mere existence of a data… ▽ More

    Submitted 8 October, 2022; originally announced October 2022.

  28. arXiv:2209.04892  [pdf, ps, other

    econ.TH cs.GT cs.LG stat.ML

    "Calibeating": Beating Forecasters at Their Own Game

    Authors: Dean P. Foster, Sergiu Hart

    Abstract: In order to identify expertise, forecasters should not be tested by their calibration score, which can always be made arbitrarily small, but rather by their Brier score. The Brier score is the sum of the calibration score and the refinement score; the latter measures how good the sorting into bins with the same forecast is, and thus attests to "expertise." This raises the question of whether one c… ▽ More

    Submitted 26 October, 2022; v1 submitted 11 September, 2022; originally announced September 2022.

    Comments: http://www.ma.huji.ac.il/hart/publ.html#calib-beat

  29. arXiv:2207.08229  [pdf, other

    cs.LG cs.RO stat.ML

    Guaranteed Discovery of Control-Endogenous Latent States with Multi-Step Inverse Models

    Authors: Alex Lamb, Riashat Islam, Yonathan Efroni, Aniket Didolkar, Dipendra Misra, Dylan Foster, Lekan Molu, Rajan Chari, Akshay Krishnamurthy, John Langford

    Abstract: In many sequential decision-making tasks, the agent is not able to model the full complexity of the world, which consists of multitudes of relevant and irrelevant information. For example, a person walking along a city street who tries to model all aspects of the world would quickly be overwhelmed by a multitude of shops, cars, and people moving in and out of view, each following their own complex… ▽ More

    Submitted 27 December, 2022; v1 submitted 17 July, 2022; originally announced July 2022.

    Comments: Project Website: https://controllable-latent-state.github.io/

  30. arXiv:2207.05836  [pdf, other

    cs.LG stat.ML

    Contextual Bandits with Large Action Spaces: Made Practical

    Authors: Yinglun Zhu, Dylan J. Foster, John Langford, Paul Mineiro

    Abstract: A central problem in sequential decision making is to develop algorithms that are practical and computationally efficient, yet support the use of flexible, general-purpose models. Focusing on the contextual bandit problem, recent progress provides provably efficient algorithms with strong empirical performance when the number of possible alternatives ("actions") is small, but guarantees for decisi… ▽ More

    Submitted 12 July, 2022; originally announced July 2022.

    Comments: To appear at ICML 2022

  31. arXiv:2206.13063  [pdf, other

    cs.LG math.OC math.ST stat.ML

    On the Complexity of Adversarial Decision Making

    Authors: Dylan J. Foster, Alexander Rakhlin, Ayush Sekhari, Karthik Sridharan

    Abstract: A central problem in online learning and decision making -- from bandits to reinforcement learning -- is to understand what modeling assumptions lead to sample-efficient learning guarantees. We consider a general adversarial decision making framework that encompasses (structured) bandit problems with adversarial rewards and reinforcement learning problems with adversarial dynamics. Our main result… ▽ More

    Submitted 27 June, 2022; originally announced June 2022.

  32. arXiv:2206.08364  [pdf, other

    cs.LG cs.AI cs.HC stat.ML

    Interaction-Grounded Learning with Action-inclusive Feedback

    Authors: Tengyang Xie, Akanksha Saran, Dylan J. Foster, Lekan Molu, Ida Momennejad, Nan Jiang, Paul Mineiro, John Langford

    Abstract: Consider the problem setting of Interaction-Grounded Learning (IGL), in which a learner's goal is to optimally interact with the environment with no explicit reward to ground its policies. The agent observes a context vector, takes an action, and receives a feedback vector, using this information to effectively optimize a policy with respect to a latent reward function. Prior analyzed approaches f… ▽ More

    Submitted 12 October, 2022; v1 submitted 16 June, 2022; originally announced June 2022.

    Comments: Published in NeurIPS 2022

  33. arXiv:2112.13487  [pdf, other

    cs.LG math.OC math.ST stat.ML

    The Statistical Complexity of Interactive Decision Making

    Authors: Dylan J. Foster, Sham M. Kakade, Jian Qian, Alexander Rakhlin

    Abstract: A fundamental challenge in interactive learning and decision making, ranging from bandit problems to reinforcement learning, is to provide sample-efficient, adaptive learning algorithms that achieve near-optimal regret. This question is analogous to the classical problem of optimal (supervised) statistical learning, where there are well-known complexity measures (e.g., VC dimension and Rademacher… ▽ More

    Submitted 11 July, 2023; v1 submitted 26 December, 2021; originally announced December 2021.

    Comments: Minor improvements to writing and organization

  34. arXiv:2112.07602  [pdf, other

    stat.ME stat.AP stat.ML

    Meta-Analysis of Randomized Experiments with Applications to Heavy-Tailed Response Data

    Authors: Nilesh Tripuraneni, Dhruv Madeka, Dean Foster, Dominique Perrault-Joncas, Michael I. Jordan

    Abstract: A central obstacle in the objective assessment of treatment effect (TE) estimators in randomized control trials (RCTs) is the lack of ground truth (or validation set) to test their performance. In this paper, we propose a novel cross-validation-like methodology to address this challenge. The key insight of our procedure is that the noisy (but unbiased) difference-of-means estimate can be used as a… ▽ More

    Submitted 9 January, 2023; v1 submitted 14 December, 2021; originally announced December 2021.

  35. arXiv:2111.10919  [pdf, other

    cs.LG stat.ML

    Offline Reinforcement Learning: Fundamental Barriers for Value Function Approximation

    Authors: Dylan J. Foster, Akshay Krishnamurthy, David Simchi-Levi, Yunzong Xu

    Abstract: We consider the offline reinforcement learning problem, where the aim is to learn a decision making policy from logged data. Offline RL -- particularly when coupled with (value) function approximation to allow for generalization in large or continuous state spaces -- is becoming increasingly relevant in practice, because it avoids costly and time-consuming online data collection and is well suited… ▽ More

    Submitted 30 August, 2022; v1 submitted 21 November, 2021; originally announced November 2021.

    Comments: Accepted for presentation at the Conference on Learning Theory (COLT) 2022

  36. arXiv:2108.04552  [pdf, other

    cs.LG math.OC stat.ML

    The Benefits of Implicit Regularization from SGD in Least Squares Problems

    Authors: Difan Zou, Jingfeng Wu, Vladimir Braverman, Quanquan Gu, Dean P. Foster, Sham M. Kakade

    Abstract: Stochastic gradient descent (SGD) exhibits strong algorithmic regularization effects in practice, which has been hypothesized to play an important role in the generalization of modern machine learning approaches. In this work, we seek to understand these issues in the simpler setting of linear regression (including both underparameterized and overparameterized regimes), where our goal is to make s… ▽ More

    Submitted 10 July, 2022; v1 submitted 10 August, 2021; originally announced August 2021.

    Comments: 33 pages, 1 figure. In NeurIPS 2021

  37. arXiv:2107.05745  [pdf, ps, other

    cs.LG stat.ML

    Adapting to Misspecification in Contextual Bandits

    Authors: Dylan J. Foster, Claudio Gentile, Mehryar Mohri, Julian Zimmert

    Abstract: A major research direction in contextual bandits is to develop algorithms that are computationally efficient, yet support flexible, general-purpose function approximation. Algorithms based on modeling rewards have shown strong empirical performance, but typically require a well-specified model, and can fail when this assumption does not hold. Can we design algorithms that are efficient and flexibl… ▽ More

    Submitted 12 July, 2021; originally announced July 2021.

    Comments: Appeared at NeurIPS 2020

  38. arXiv:2107.02237  [pdf, other

    cs.LG math.ST stat.ML

    Efficient First-Order Contextual Bandits: Prediction, Allocation, and Triangular Discrimination

    Authors: Dylan J. Foster, Akshay Krishnamurthy

    Abstract: A recurring theme in statistical learning, online learning, and beyond is that faster convergence rates are possible for problems with low noise, often quantified by the performance of the best hypothesis; such results are known as first-order or small-loss guarantees. While first-order guarantees are relatively well understood in statistical and online learning, adapting to low noise in contextua… ▽ More

    Submitted 5 July, 2021; originally announced July 2021.

  39. arXiv:2105.06834  [pdf, other

    cs.LG stat.ME

    Threshold Martingales and the Evolution of Forecasts

    Authors: Dean P. Foster, Robert A. Stine

    Abstract: This paper introduces a martingale that characterizes two properties of evolving forecast distributions. Ideal forecasts of a future event behave as martingales, sequen- tially updating the forecast to leverage the available information as the future event approaches. The threshold martingale introduced here measures the proportion of the forecast distribution lying below a threshold. In addition… ▽ More

    Submitted 14 May, 2021; originally announced May 2021.

  40. arXiv:2104.06970  [pdf, other

    cs.LG stat.ML

    Understanding the Eluder Dimension

    Authors: Gene Li, Pritish Kamath, Dylan J. Foster, Nathan Srebro

    Abstract: We provide new insights on eluder dimension, a complexity measure that has been extensively used to bound the regret of algorithms for online bandits and reinforcement learning with function approximation. First, we study the relationship between the eluder dimension for a function class and a generalized notion of rank, defined for any monotone "activation" $σ: \mathbb{R}\to \mathbb{R}$, which co… ▽ More

    Submitted 4 October, 2022; v1 submitted 14 April, 2021; originally announced April 2021.

    Comments: NeurIPS 2022

  41. arXiv:2103.02062  [pdf, other

    cs.LG stat.ML

    Variance Reduced Training with Stratified Sampling for Forecasting Models

    Authors: Yucheng Lu, Youngsuk Park, Lifan Chen, Yuyang Wang, Christopher De Sa, Dean Foster

    Abstract: In large-scale time series forecasting, one often encounters the situation where the temporal patterns of time series, while drifting over time, differ from one another in the same dataset. In this paper, we provably show under such heterogeneity, training a forecasting model with commonly used stochastic optimizers (e.g. SGD) potentially suffers large variance on gradient estimation, and thus inc… ▽ More

    Submitted 11 June, 2021; v1 submitted 2 March, 2021; originally announced March 2021.

  42. arXiv:2102.07800  [pdf, other

    stat.ML cs.AI cs.LG

    Top-$k$ eXtreme Contextual Bandits with Arm Hierarchy

    Authors: Rajat Sen, Alexander Rakhlin, Lexing Ying, Rahul Kidambi, Dean Foster, Daniel Hill, Inderjit Dhillon

    Abstract: Motivated by modern applications, such as online advertisement and recommender systems, we study the top-$k$ extreme contextual bandits problem, where the total number of arms can be enormous, and the learner is allowed to select $k$ arms and observe all or some of the rewards for the chosen arms. We first propose an algorithm for the non-extreme realizable setting, utilizing the Inverse Gap Weigh… ▽ More

    Submitted 15 February, 2021; originally announced February 2021.

  43. arXiv:2010.11895  [pdf, other

    cs.LG cs.AI math.OC stat.ML

    What are the Statistical Limits of Offline RL with Linear Function Approximation?

    Authors: Ruosong Wang, Dean P. Foster, Sham M. Kakade

    Abstract: Offline reinforcement learning seeks to utilize offline (observational) data to guide the learning of (causal) sequential decision making strategies. The hope is that offline reinforcement learning coupled with function approximation methods (to deal with the curse of dimensionality) can provide a means to help alleviate the excessive sample complexity burden in modern sequential decision making p… ▽ More

    Submitted 22 October, 2020; originally announced October 2020.

  44. arXiv:2010.03799  [pdf, ps, other

    cs.LG math.OC math.ST stat.ML

    Learning the Linear Quadratic Regulator from Nonlinear Observations

    Authors: Zakaria Mhammedi, Dylan J. Foster, Max Simchowitz, Dipendra Misra, Wen Sun, Akshay Krishnamurthy, Alexander Rakhlin, John Langford

    Abstract: We introduce a new problem setting for continuous control called the LQR with Rich Observations, or RichLQR. In our setting, the environment is summarized by a low-dimensional continuous latent state with linear dynamics and quadratic costs, but the agent operates on high-dimensional, nonlinear observations such as images from a camera. To enable sample-efficient learning, we assume that the learn… ▽ More

    Submitted 8 October, 2020; originally announced October 2020.

    Comments: To appear at NeurIPS 2020

  45. arXiv:2010.03104  [pdf, other

    cs.LG math.ST stat.ML

    Instance-Dependent Complexity of Contextual Bandits and Reinforcement Learning: A Disagreement-Based Perspective

    Authors: Dylan J. Foster, Alexander Rakhlin, David Simchi-Levi, Yunzong Xu

    Abstract: In the classical multi-armed bandit problem, instance-dependent algorithms attain improved performance on "easy" problems with a gap between the best and second-best arm. Are similar guarantees possible for contextual bandits? While positive results are known for certain special cases, there is no general theory characterizing when and how instance-dependent regret bounds for contextual bandits ca… ▽ More

    Submitted 6 October, 2020; originally announced October 2020.

  46. arXiv:2007.01160  [pdf, ps, other

    cs.LG stat.ML

    Tight Bounds on Minimax Regret under Logarithmic Loss via Self-Concordance

    Authors: Blair Bilodeau, Dylan J. Foster, Daniel M. Roy

    Abstract: We consider the classical problem of sequential probability assignment under logarithmic loss while competing against an arbitrary, potentially nonparametric class of experts. We obtain tight bounds on the minimax regret via a new approach that exploits the self-concordance property of the logarithmic loss. We show that for any expert class with (sequential) metric entropy $\mathcal{O}(γ^{-p})$ at… ▽ More

    Submitted 3 August, 2020; v1 submitted 2 July, 2020; originally announced July 2020.

    Comments: 25 pages

    Journal ref: Proceedings of the 37th International Conference on Machine Learning, ICML 2020

  47. arXiv:2006.13476  [pdf, other

    cs.LG math.OC stat.ML

    Second-Order Information in Non-Convex Stochastic Optimization: Power and Limitations

    Authors: Yossi Arjevani, Yair Carmon, John C. Duchi, Dylan J. Foster, Ayush Sekhari, Karthik Sridharan

    Abstract: We design an algorithm which finds an $ε$-approximate stationary point (with $\|\nabla F(x)\|\le ε$) using $O(ε^{-3})$ stochastic gradient and Hessian-vector products, matching guarantees that were previously available only under a stronger assumption of access to multiple queries with the same random seed. We prove a lower bound which establishes that this rate is optimal and---surprisingly---tha… ▽ More

    Submitted 24 June, 2020; originally announced June 2020.

    Comments: Accepted to CONFERENCE ON LEARNING THEORY (COLT) 2020

  48. arXiv:2006.10940  [pdf, ps, other

    cs.LG stat.ML

    Open Problem: Model Selection for Contextual Bandits

    Authors: Dylan J. Foster, Akshay Krishnamurthy, Haipeng Luo

    Abstract: In statistical learning, algorithms for model selection allow the learner to adapt to the complexity of the best hypothesis class in a sequence. We ask whether similar guarantees are possible for contextual bandit learning.

    Submitted 18 June, 2020; originally announced June 2020.

    Comments: COLT 2020 open problem

  49. arXiv:2004.14681  [pdf, other

    cs.LG math.OC math.ST stat.ML

    Learning nonlinear dynamical systems from a single trajectory

    Authors: Dylan J. Foster, Alexander Rakhlin, Tuhin Sarkar

    Abstract: We introduce algorithms for learning nonlinear dynamical systems of the form $x_{t+1}=σ(Θ^{\star}x_t)+\varepsilon_t$, where $Θ^{\star}$ is a weight matrix, $σ$ is a nonlinear link function, and $\varepsilon_t$ is a mean-zero noise process. We give an algorithm that recovers the weight matrix $Θ^{\star}$ from a single trajectory with optimal sample complexity and linear running time. The algorithm… ▽ More

    Submitted 30 April, 2020; originally announced April 2020.

    Comments: To appear at L4DC 2020

  50. arXiv:2004.13105  [pdf, other

    stat.ME physics.ao-ph physics.data-an

    A Bayesian approach to regional decadal predictability: Sparse parameter estimation in high-dimensional linear inverse models of high-latitude sea surface temperature variability

    Authors: Dallas Foster, Darin Comeau, Nathan M. Urban

    Abstract: Stochastic reduced models are an important tool in climate systems whose many spatial and temporal scales cannot be fully discretized or underlying physics may not be fully accounted for. One form of reduced model, the linear inverse model (LIM), has been widely used for regional climate predictability studies - typically focusing more on tropical or mid-latitude studies. However, most LIM fitting… ▽ More

    Submitted 27 April, 2020; originally announced April 2020.

    Comments: This work has been accepted to the Journal of Climate. The AMS does not guarantee that the copy provided here is an accurate copy of the final published work

    Report number: LA-UR-19-30206