Skip to main content

Showing 1–24 of 24 results for author: Dudik, M

Searching in archive stat. Search in all archives.
.
  1. arXiv:2411.09730  [pdf, other

    cs.LG cs.AI stat.AP stat.ML

    SureMap: Simultaneous Mean Estimation for Single-Task and Multi-Task Disaggregated Evaluation

    Authors: Mikhail Khodak, Lester Mackey, Alexandra Chouldechova, Miroslav Dudík

    Abstract: Disaggregated evaluation -- estimation of performance of a machine learning model on different subpopulations -- is a core task when assessing performance and group-fairness of AI systems. A key challenge is that evaluation data is scarce, and subpopulations arising from intersections of attributes (e.g., race, sex, age) are often tiny. Today, it is common for multiple clients to procure the same… ▽ More

    Submitted 14 November, 2024; originally announced November 2024.

    Comments: NeurIPS 2024

  2. arXiv:2401.14893  [pdf, other

    cs.LG cs.CY stat.AP stat.ML

    A structured regression approach for evaluating model performance across intersectional subgroups

    Authors: Christine Herlihy, Kimberly Truong, Alexandra Chouldechova, Miroslav Dudik

    Abstract: Disaggregated evaluation is a central task in AI fairness assessment, where the goal is to measure an AI system's performance across different subgroups defined by combinations of demographic or other sensitive attributes. The standard approach is to stratify the evaluation data across subgroups and compute performance metrics separately for each group. However, even for moderately-sized evaluatio… ▽ More

    Submitted 14 May, 2024; v1 submitted 26 January, 2024; originally announced January 2024.

  3. arXiv:2306.06184  [pdf, other

    cs.LG stat.ML

    A Unified Model and Dimension for Interactive Estimation

    Authors: Nataly Brukhim, Miroslav Dudik, Aldo Pacchiano, Robert Schapire

    Abstract: We study an abstract framework for interactive learning called interactive estimation in which the goal is to estimate a target from its "similarity'' to points queried by the learner. We introduce a combinatorial measure called dissimilarity dimension which largely captures learnability in our model. We present a simple, general, and broadly-applicable algorithm, for which we obtain both regret a… ▽ More

    Submitted 9 June, 2023; originally announced June 2023.

  4. arXiv:2205.14237  [pdf, other

    cs.LG cs.AI stat.ML

    Provably Sample-Efficient RL with Side Information about Latent Dynamics

    Authors: Yao Liu, Dipendra Misra, Miro Dudík, Robert E. Schapire

    Abstract: We study reinforcement learning (RL) in settings where observations are high-dimensional, but where an RL agent has access to abstract knowledge about the structure of the state space, as is the case, for example, when a robot is tasked to go to a specific room in a building using observations from its own camera, while having access to the floor plan. We formalize this setting as transfer reinfor… ▽ More

    Submitted 27 May, 2022; originally announced May 2022.

    Comments: 35 pages, 4 figures

  5. arXiv:2202.05318  [pdf, other

    stat.ML cs.CR cs.LG math.OC

    Personalization Improves Privacy-Accuracy Tradeoffs in Federated Learning

    Authors: Alberto Bietti, Chen-Yu Wei, Miroslav Dudík, John Langford, Zhiwei Steven Wu

    Abstract: Large-scale machine learning systems often involve data distributed across a collection of users. Federated learning algorithms leverage this structure by communicating model updates to a central server, rather than entire datasets. In this paper, we study stochastic optimization algorithms for a personalized federated learning setting involving local and global models subject to user-level (joint… ▽ More

    Submitted 15 July, 2022; v1 submitted 10 February, 2022; originally announced February 2022.

    Comments: ICML

  6. arXiv:2107.01509  [pdf, other

    cs.LG math.ST stat.ML

    Bayesian decision-making under misspecified priors with applications to meta-learning

    Authors: Max Simchowitz, Christopher Tosh, Akshay Krishnamurthy, Daniel Hsu, Thodoris Lykouris, Miroslav Dudík, Robert E. Schapire

    Abstract: Thompson sampling and other Bayesian sequential decision-making algorithms are among the most popular approaches to tackle explore/exploit trade-offs in (contextual) bandits. The choice of prior in these algorithms offers flexibility to encode domain knowledge but can also lead to poor performance when misspecified. In this paper, we demonstrate that performance degrades gracefully with misspecifi… ▽ More

    Submitted 3 July, 2021; originally announced July 2021.

  7. arXiv:2006.11226  [pdf, other

    cs.LG math.OC stat.ML

    Gradient descent follows the regularization path for general losses

    Authors: Ziwei Ji, Miroslav Dudík, Robert E. Schapire, Matus Telgarsky

    Abstract: Recent work across many machine learning disciplines has highlighted that standard descent methods, even without explicit regularization, do not merely minimize the training error, but also exhibit an implicit bias. This bias is typically towards a certain regularized solution, and relies upon the details of the learning process, for instance the use of the cross-entropy loss. In this work, we s… ▽ More

    Submitted 19 June, 2020; originally announced June 2020.

    Comments: To appear, COLT 2020

  8. arXiv:2006.05051  [pdf, other

    cs.LG cs.AI cs.DS stat.ML

    Constrained episodic reinforcement learning in concave-convex and knapsack settings

    Authors: Kianté Brantley, Miroslav Dudik, Thodoris Lykouris, Sobhan Miryoosefi, Max Simchowitz, Aleksandrs Slivkins, Wen Sun

    Abstract: We propose an algorithm for tabular episodic reinforcement learning with constraints. We provide a modular analysis with strong theoretical guarantees for settings with concave rewards and convex constraints, and for settings with hard constraints (knapsacks). Most of the previous work in constrained reinforcement learning is limited to linear constraints, and the remaining work focuses on either… ▽ More

    Submitted 5 June, 2021; v1 submitted 9 June, 2020; originally announced June 2020.

    Comments: The NeurIPS 2020 version of this paper includes a small bug, leading to an incorrect dependence on H in Theorem 3.4. This version fixes it by adjusting Eq. (9), Theorem 3.4 and the relevant proofs. Changes in the main text are noted in red. Changes in the appendix are limited to Appendices B.1, B.5, and B.6 and the statement of Lemma F.3

  9. arXiv:1907.09623  [pdf, other

    cs.LG stat.ML

    Doubly robust off-policy evaluation with shrinkage

    Authors: Yi Su, Maria Dimakopoulou, Akshay Krishnamurthy, Miroslav Dudík

    Abstract: We propose a new framework for designing estimators for off-policy evaluation in contextual bandits. Our approach is based on the asymptotically optimal doubly robust estimator, but we shrink the importance weights to minimize a bound on the mean squared error, which results in a better bias-variance tradeoff in finite samples. We use this optimization-based framework to obtain three estimators: (… ▽ More

    Submitted 18 September, 2020; v1 submitted 22 July, 2019; originally announced July 2019.

    Journal ref: International Conference on Machine Learning (2020)

  10. arXiv:1906.09323  [pdf, other

    cs.LG cs.AI cs.GT stat.ML

    Reinforcement Learning with Convex Constraints

    Authors: Sobhan Miryoosefi, Kianté Brantley, Hal Daumé III, Miroslav Dudik, Robert Schapire

    Abstract: In standard reinforcement learning (RL), a learning agent seeks to optimize the overall reward. However, many key aspects of a desired behavior are more naturally expressed as constraints. For instance, the designer may want to limit the use of unsafe actions, increase the diversity of trajectories to enable exploration, or approximate expert trajectories when rewards are sparse. In this paper, we… ▽ More

    Submitted 11 November, 2019; v1 submitted 21 June, 2019; originally announced June 2019.

    Journal ref: Advances in Neural Information Processing Systems 32 (2019), 14093-14102

  11. arXiv:1905.12843  [pdf, other

    cs.LG stat.ML

    Fair Regression: Quantitative Definitions and Reduction-based Algorithms

    Authors: Alekh Agarwal, Miroslav Dudík, Zhiwei Steven Wu

    Abstract: In this paper, we study the prediction of a real-valued target, such as a risk score or recidivism rate, while guaranteeing a quantitative notion of fairness with respect to a protected attribute such as gender or race. We call this class of problems \emph{fair regression}. We propose general schemes for fair regression under two notions of fairness: (1) statistical parity, which asks that the pre… ▽ More

    Submitted 29 May, 2019; originally announced May 2019.

  12. arXiv:1901.09018  [pdf, other

    cs.LG stat.ML

    Provably efficient RL with Rich Observations via Latent State Decoding

    Authors: Simon S. Du, Akshay Krishnamurthy, Nan Jiang, Alekh Agarwal, Miroslav Dudík, John Langford

    Abstract: We study the exploration problem in episodic MDPs with rich observations generated from a small number of latent states. Under certain identifiability assumptions, we demonstrate how to estimate a mapping from the observations to latent states inductively through a sequence of regression and clustering steps -- where previously decoded latent states provide labels for later regression problems --… ▽ More

    Submitted 9 September, 2021; v1 submitted 25 January, 2019; originally announced January 2019.

    Comments: The ICML 2019 version omitted the second constraint on $ε$ in Theorem 4.1. We thank Yonathan Efroni for calling this to our attention

  13. arXiv:1803.01088  [pdf, other

    cs.LG stat.ML

    Practical Contextual Bandits with Regression Oracles

    Authors: Dylan J. Foster, Alekh Agarwal, Miroslav Dudík, Haipeng Luo, Robert E. Schapire

    Abstract: A major challenge in contextual bandits is to design general-purpose algorithms that are both practically useful and theoretically well-founded. We present a new technique that has the empirical and computational advantages of realizability-based approaches combined with the flexibility of agnostic methods. Our algorithms leverage the availability of a regression oracle for the value-function clas… ▽ More

    Submitted 2 March, 2018; originally announced March 2018.

  14. arXiv:1803.00590  [pdf, other

    cs.LG cs.AI stat.ML

    Hierarchical Imitation and Reinforcement Learning

    Authors: Hoang M. Le, Nan Jiang, Alekh Agarwal, Miroslav Dudík, Yisong Yue, Hal Daumé III

    Abstract: We study how to effectively leverage expert feedback to learn sequential decision-making policies. We focus on problems with sparse rewards and long time horizons, which typically pose significant challenges in reinforcement learning. We propose an algorithmic framework, called hierarchical guidance, that leverages the hierarchical structure of the underlying problem to integrate different modes o… ▽ More

    Submitted 9 June, 2018; v1 submitted 1 March, 2018; originally announced March 2018.

    Comments: Proceedings of the 35th International Conference on Machine Learning (ICML 2018)

  15. arXiv:1612.01205  [pdf, other

    stat.ML cs.LG

    Optimal and Adaptive Off-policy Evaluation in Contextual Bandits

    Authors: Yu-Xiang Wang, Alekh Agarwal, Miroslav Dudik

    Abstract: We study the off-policy evaluation problem---estimating the value of a target policy using data collected by another policy---under the contextual bandit model. We consider the general (agnostic) setting without access to a consistent model of rewards and establish a minimax lower bound on the mean squared error (MSE). The bound is matched up to constants by the inverse propensity scoring (IPS) an… ▽ More

    Submitted 11 November, 2017; v1 submitted 4 December, 2016; originally announced December 2016.

    Journal ref: International Conference on Machine Learning (pp. 3589-3597) (2017)

  16. arXiv:1605.04812  [pdf, other

    cs.LG cs.AI stat.ML

    Off-policy evaluation for slate recommendation

    Authors: Adith Swaminathan, Akshay Krishnamurthy, Alekh Agarwal, Miroslav Dudík, John Langford, Damien Jose, Imed Zitouni

    Abstract: This paper studies the evaluation of policies that recommend an ordered set of items (e.g., a ranking) based on some context---a common scenario in web search, ads, and recommendation. We build on techniques from combinatorial bandits to introduce a new practical estimator that uses logged data to estimate a policy's performance. A thorough empirical evaluation on real-world data reveals that our… ▽ More

    Submitted 6 November, 2017; v1 submitted 16 May, 2016; originally announced May 2016.

    Comments: 31 pages (9 main paper, 20 supplementary), 12 figures (2 main paper, 10 supplementary)

  17. arXiv:1506.04513  [pdf, other

    cs.LG stat.ML

    Convex Risk Minimization and Conditional Probability Estimation

    Authors: Matus Telgarsky, Miroslav Dudík, Robert Schapire

    Abstract: This paper proves, in very general settings, that convex risk minimization is a procedure to select a unique conditional probability model determined by the classification problem. Unlike most previous work, we give results that are general enough to include cases in which no minimum exists, as occurs typically, for instance, with standard boosting algorithms. Concretely, we first show that any se… ▽ More

    Submitted 15 June, 2015; originally announced June 2015.

    Comments: To appear, COLT 2015

  18. arXiv:1503.02834  [pdf, ps, other

    stat.ME cs.AI

    Doubly Robust Policy Evaluation and Optimization

    Authors: Miroslav Dudík, Dumitru Erhan, John Langford, Lihong Li

    Abstract: We study sequential decision making in environments where rewards are only partially observed, but can be modeled as a function of observed contexts and the chosen action by the decision maker. This setting, known as contextual bandits, encompasses a wide variety of applications such as health care, content recommendation and Internet advertising. A central task is evaluation of a new policy given… ▽ More

    Submitted 10 March, 2015; originally announced March 2015.

    Comments: Published in at http://dx.doi.org/10.1214/14-STS500 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-STS-STS500

    Journal ref: Statistical Science 2014, Vol. 29, No. 4, 485-511

  19. arXiv:1502.05890  [pdf, other

    cs.LG stat.ML

    Contextual Semibandits via Supervised Learning Oracles

    Authors: Akshay Krishnamurthy, Alekh Agarwal, Miroslav Dudik

    Abstract: We study an online decision making problem where on each round a learner chooses a list of items based on some side information, receives a scalar feedback value for each individual item, and a reward that is linearly related to this feedback. These problems, known as contextual semibandits, arise in crowdsourcing, recommendation, and many other domains. This paper reduces contextual semibandits t… ▽ More

    Submitted 4 November, 2016; v1 submitted 20 February, 2015; originally announced February 2015.

  20. arXiv:1310.8243  [pdf, other

    cs.LG stat.ML

    Para-active learning

    Authors: Alekh Agarwal, Leon Bottou, Miroslav Dudik, John Langford

    Abstract: Training examples are not all equally informative. Active learning strategies leverage this observation in order to massively reduce the number of examples that need to be labeled. We leverage the same observation to build a generic strategy for parallelizing learning algorithms. This strategy is effective because the search for informative examples is highly parallelizable and because we show tha… ▽ More

    Submitted 30 October, 2013; originally announced October 2013.

  21. arXiv:1210.4862  [pdf

    cs.LG stat.ML

    Sample-efficient Nonstationary Policy Evaluation for Contextual Bandits

    Authors: Miroslav Dudik, Dumitru Erhan, John Langford, Lihong Li

    Abstract: We present and prove properties of a new offline policy evaluator for an exploration learning setting which is superior to previous evaluators. In particular, it simultaneously and correctly incorporates techniques from importance weighting, doubly robust evaluation, and nonstationary policy evaluation approaches. In addition, our approach allows generating longer histories by careful control of a… ▽ More

    Submitted 16 October, 2012; originally announced October 2012.

    Comments: Appears in Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence (UAI2012)

    Report number: UAI-P-2012-PG-247-254

  22. arXiv:1110.4198  [pdf, other

    cs.LG stat.ML

    A Reliable Effective Terascale Linear Learning System

    Authors: Alekh Agarwal, Olivier Chapelle, Miroslav Dudik, John Langford

    Abstract: We present a system and a set of techniques for learning linear predictors with convex losses on terascale datasets, with trillions of features, {The number of features here refers to the number of non-zero entries in the data matrix.} billions of training examples and millions of parameters in an hour using a cluster of 1000 machines. Individually none of the component techniques are new, but the… ▽ More

    Submitted 11 July, 2013; v1 submitted 19 October, 2011; originally announced October 2011.

  23. arXiv:1106.2369  [pdf, ps, other

    cs.LG cs.AI stat.ML

    Efficient Optimal Learning for Contextual Bandits

    Authors: Miroslav Dudik, Daniel Hsu, Satyen Kale, Nikos Karampatziakis, John Langford, Lev Reyzin, Tong Zhang

    Abstract: We address the problem of learning in an online setting where the learner repeatedly observes features, selects among a set of actions, and receives reward for the action taken. We provide the first efficient algorithm with an optimal regret. Our algorithm uses a cost sensitive classification learner as an oracle and has a running time $\mathrm{polylog}(N)$, where $N$ is the number of classificati… ▽ More

    Submitted 12 June, 2011; originally announced June 2011.

  24. arXiv:1103.4601  [pdf, ps, other

    cs.LG cs.AI cs.RO stat.AP stat.ML

    Doubly Robust Policy Evaluation and Learning

    Authors: Miroslav Dudik, John Langford, Lihong Li

    Abstract: We study decision making in environments where the reward is only partially observed, but can be modeled as a function of an action and an observed context. This setting, known as contextual bandits, encompasses a wide variety of applications including health-care policy and Internet advertising. A central task is evaluation of a new policy given historic data consisting of contexts, actions and r… ▽ More

    Submitted 5 May, 2011; v1 submitted 23 March, 2011; originally announced March 2011.

    Comments: Published at ICML 2011, 8 pages, 6 figures