Skip to main content

Showing 1–50 of 57 results for author: Mansour, Y

Searching in archive stat. Search in all archives.
.
  1. arXiv:2502.11033  [pdf, ps, other

    cs.LG math.OC stat.ML

    Convergence of Policy Mirror Descent Beyond Compatible Function Approximation

    Authors: Uri Sherman, Tomer Koren, Yishay Mansour

    Abstract: Modern policy optimization methods roughly follow the policy mirror descent (PMD) algorithmic template, for which there are by now numerous theoretical convergence results. However, most of these either target tabular environments, or can be applied effectively only when the class of policies being optimized over satisfies strong closure conditions, which is typically not the case when working wit… ▽ More

    Submitted 23 March, 2025; v1 submitted 16 February, 2025; originally announced February 2025.

  2. arXiv:2412.08012  [pdf, other

    cs.LG stat.ML

    Of Dice and Games: A Theory of Generalized Boosting

    Authors: Marco Bressan, Nataly Brukhim, Nicolò Cesa-Bianchi, Emmanuel Esposito, Yishay Mansour, Shay Moran, Maximilian Thiessen

    Abstract: Cost-sensitive loss functions are crucial in many real-world prediction problems, where different types of errors are penalized differently; for example, in medical diagnosis, a false negative prediction can lead to worse consequences than a false positive prediction. However, traditional PAC learning theory has mostly focused on the symmetric 0-1 loss, leaving cost-sensitive losses largely unaddr… ▽ More

    Submitted 10 December, 2024; originally announced December 2024.

  3. arXiv:2411.13029  [pdf, other

    cs.LG stat.ML

    Probably Approximately Precision and Recall Learning

    Authors: Lee Cohen, Yishay Mansour, Shay Moran, Han Shao

    Abstract: Precision and Recall are foundational metrics in machine learning where both accurate predictions and comprehensive coverage are essential, such as in recommender systems and multi-label learning. In these tasks, balancing precision (the proportion of relevant items among those predicted) and recall (the proportion of relevant items successfully predicted) is crucial. A key challenge is that one-s… ▽ More

    Submitted 19 November, 2024; originally announced November 2024.

  4. arXiv:2411.06501  [pdf, ps, other

    cs.LG stat.ML

    Individual Regret in Cooperative Stochastic Multi-Armed Bandits

    Authors: Idan Barnea, Tal Lancewicki, Yishay Mansour

    Abstract: We study the regret in stochastic Multi-Armed Bandits (MAB) with multiple agents that communicate over an arbitrary connected communication graph. We show a near-optimal individual regret bound of $\tilde{O}(\sqrt{AT/m}+A)$, where $A$ is the number of actions, $T$ the time horizon, and $m$ the number of agents. In particular, assuming a sufficient number of agents, we achieve a regret bound of… ▽ More

    Submitted 10 November, 2024; originally announced November 2024.

    Comments: 42 pages, 1 figure

  5. arXiv:2409.08570  [pdf, other

    cs.LG stat.ML

    Batch Ensemble for Variance Dependent Regret in Stochastic Bandits

    Authors: Asaf Cassel, Orin Levy, Yishay Mansour

    Abstract: Efficiently trading off exploration and exploitation is one of the key challenges in online Reinforcement Learning (RL). Most works achieve this by carefully estimating the model uncertainty and following the so-called optimistic model. Inspired by practical ensemble methods, in this work we propose a simple and novel batch ensemble scheme that provably achieves near-optimal regret for stochastic… ▽ More

    Submitted 13 September, 2024; originally announced September 2024.

  6. arXiv:2407.02279  [pdf, other

    cs.LG stat.ML

    How to Boost Any Loss Function

    Authors: Richard Nock, Yishay Mansour

    Abstract: Boosting is a highly successful ML-born optimization setting in which one is required to computationally efficiently learn arbitrarily good models based on the access to a weak learner oracle, providing classifiers performing at least slightly differently from random guessing. A key difference with gradient-based optimization is that boosting's original model does not requires access to first orde… ▽ More

    Submitted 14 November, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

    Comments: NeurIPS'24

    ACM Class: I.2.6

  7. arXiv:2406.12406  [pdf, ps, other

    cs.LG cs.AI stat.ML

    Fast Rates for Bandit PAC Multiclass Classification

    Authors: Liad Erez, Alon Cohen, Tomer Koren, Yishay Mansour, Shay Moran

    Abstract: We study multiclass PAC learning with bandit feedback, where inputs are classified into one of $K$ possible labels and feedback is limited to whether or not the predicted labels are correct. Our main contribution is in designing a novel learning algorithm for the agnostic $(\varepsilon,δ)$-PAC version of the problem, with sample complexity of… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  8. arXiv:2406.10529  [pdf, ps, other

    cs.LG cs.AI stat.ML

    A Theory of Interpretable Approximations

    Authors: Marco Bressan, Nicolò Cesa-Bianchi, Emmanuel Esposito, Yishay Mansour, Shay Moran, Maximilian Thiessen

    Abstract: Can a deep neural network be approximated by a small decision tree based on simple features? This question and its variants are behind the growing demand for machine learning models that are *interpretable* by humans. In this work we study such questions by introducing *interpretable approximations*, a notion that captures the idea of approximating a target concept $c$ by a small aggregation of co… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: To appear at COLT 2024

  9. arXiv:2406.07585  [pdf, other

    stat.ML cs.LG

    Rate-Preserving Reductions for Blackwell Approachability

    Authors: Christoph Dann, Yishay Mansour, Mehryar Mohri, Jon Schneider, Balasubramanian Sivan

    Abstract: Abernethy et al. (2011) showed that Blackwell approachability and no-regret learning are equivalent, in the sense that any algorithm that solves a specific Blackwell approachability instance can be converted to a sublinear regret algorithm for a specific no-regret learning instance, and vice versa. In this paper, we study a more fine-grained form of such reductions, and ask when this translation b… ▽ More

    Submitted 17 July, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

  10. arXiv:2405.10027  [pdf, ps, other

    cs.LG cs.AI stat.ML

    The Real Price of Bandit Information in Multiclass Classification

    Authors: Liad Erez, Alon Cohen, Tomer Koren, Yishay Mansour, Shay Moran

    Abstract: We revisit the classical problem of multiclass classification with bandit feedback (Kakade, Shalev-Shwartz and Tewari, 2008), where each input classifies to one of $K$ possible labels and feedback is restricted to whether the predicted label is correct or not. Our primary inquiry is with regard to the dependency on the number of labels $K$, and whether $T$-step regret bounds in this setting can be… ▽ More

    Submitted 19 June, 2024; v1 submitted 16 May, 2024; originally announced May 2024.

  11. arXiv:2301.13087  [pdf, ps, other

    cs.LG stat.ML

    Improved Regret for Efficient Online Reinforcement Learning with Linear Function Approximation

    Authors: Uri Sherman, Tomer Koren, Yishay Mansour

    Abstract: We study reinforcement learning with linear function approximation and adversarially changing cost functions, a setup that has mostly been considered under simplifying assumptions such as full information feedback or exploratory conditions.We present a computationally efficient policy optimization algorithm for the challenging general setting of unknown dynamics and bandit feedback, featuring a co… ▽ More

    Submitted 30 January, 2023; originally announced January 2023.

  12. arXiv:2212.04216  [pdf, ps, other

    cs.LG cs.CR stat.ML

    Differentially-Private Bayes Consistency

    Authors: Olivier Bousquet, Haim Kaplan, Aryeh Kontorovich, Yishay Mansour, Shay Moran, Menachem Sadigurschi, Uri Stemmer

    Abstract: We construct a universally Bayes consistent learning rule that satisfies differential privacy (DP). We first handle the setting of binary classification and then extend our rule to the more general setting of density estimation (with respect to the total variation metric). The existence of a universally consistent DP learner reveals a stark difference with the distribution-free PAC model. Indeed,… ▽ More

    Submitted 8 December, 2022; originally announced December 2022.

  13. arXiv:2207.14211  [pdf, ps, other

    cs.LG cs.AI cs.GT stat.ML

    Regret Minimization and Convergence to Equilibria in General-sum Markov Games

    Authors: Liad Erez, Tal Lancewicki, Uri Sherman, Tomer Koren, Yishay Mansour

    Abstract: An abundance of recent impossibility results establish that regret minimization in Markov games with adversarial opponents is both statistically and computationally intractable. Nevertheless, none of these results preclude the possibility of regret minimization under the assumption that all parties adopt the same learning procedure. In this work, we present the first (to our knowledge) algorithm f… ▽ More

    Submitted 8 August, 2022; v1 submitted 28 July, 2022; originally announced July 2022.

  14. arXiv:2203.13423  [pdf, ps, other

    cs.LG cs.IR stat.ML

    Modeling Attrition in Recommender Systems with Departing Bandits

    Authors: Omer Ben-Porat, Lee Cohen, Liu Leqi, Zachary C. Lipton, Yishay Mansour

    Abstract: Traditionally, when recommender systems are formalized as multi-armed bandits, the policy of the recommender system influences the rewards accrued, but not the length of interaction. However, in real-world systems, dissatisfied users may depart (and never come back). In this work, we propose a novel multi-armed bandit setup that captures such policy-dependent horizons. Our setup consists of a fini… ▽ More

    Submitted 15 February, 2024; v1 submitted 24 March, 2022; originally announced March 2022.

    Comments: Accepted at AAAI 2022

  15. arXiv:2202.13361  [pdf, other

    cs.LG math.OC stat.ML

    Benign Underfitting of Stochastic Gradient Descent

    Authors: Tomer Koren, Roi Livni, Yishay Mansour, Uri Sherman

    Abstract: We study to what extent may stochastic gradient descent (SGD) be understood as a "conventional" learning rule that achieves generalization performance by obtaining a good fit to training data. We consider the fundamental stochastic convex optimization framework, where (one pass, without-replacement) SGD is classically known to minimize the population risk at rate $O(1/\sqrt n)$, and prove that, su… ▽ More

    Submitted 12 January, 2023; v1 submitted 27 February, 2022; originally announced February 2022.

  16. arXiv:2202.11593  [pdf, other

    cs.LG cs.AI cs.DS stat.ML

    Finding Safe Zones of policies Markov Decision Processes

    Authors: Lee Cohen, Yishay Mansour, Michal Moshkovitz

    Abstract: Given a policy of a Markov Decision Process, we define a SafeZone as a subset of states, such that most of the policy's trajectories are confined to this subset. The quality of a SafeZone is parameterized by the number of states and the escape probability, i.e., the probability that a random trajectory will leave the subset. SafeZones are especially interesting when they have a small number of sta… ▽ More

    Submitted 9 October, 2023; v1 submitted 23 February, 2022; originally announced February 2022.

    Comments: NeurIPS 2023

  17. arXiv:2202.05420  [pdf, other

    cs.LG stat.ML

    A Characterization of Semi-Supervised Adversarially-Robust PAC Learnability

    Authors: Idan Attias, Steve Hanneke, Yishay Mansour

    Abstract: We study the problem of learning an adversarially robust predictor to test time attacks in the semi-supervised PAC model. We address the question of how many labeled and unlabeled examples are required to ensure learning. We show that having enough unlabeled data (the size of a labeled sample that a fully-supervised method would require), the labeled sample complexity can be arbitrarily smaller co… ▽ More

    Submitted 5 May, 2024; v1 submitted 10 February, 2022; originally announced February 2022.

    Comments: NeurIPS 2022 camera-ready

  18. arXiv:2201.12947  [pdf, other

    stat.ML cs.LG

    Fair Wrapping for Black-box Predictions

    Authors: Alexander Soen, Ibrahim Alabdulmohsin, Sanmi Koyejo, Yishay Mansour, Nyalleng Moorosi, Richard Nock, Ke Sun, Lexing Xie

    Abstract: We introduce a new family of techniques to post-process ("wrap") a black-box classifier in order to reduce its bias. Our technique builds on the recent analysis of improper loss functions whose optimization can correct any twist in prediction, unfairness being treated as a twist. In the post-processing, we learn a wrapper function which we define as an $α$-tree, which modifies the prediction. We p… ▽ More

    Submitted 1 November, 2022; v1 submitted 30 January, 2022; originally announced January 2022.

    Comments: Published in Advances in Neural Information Processing Systems 35 (NeurIPS 2022)

  19. arXiv:2107.02738  [pdf, ps, other

    cs.LG cs.DS stat.ML

    Dueling Bandits with Team Comparisons

    Authors: Lee Cohen, Ulrike Schmidt-Kraepelin, Yishay Mansour

    Abstract: We introduce the dueling teams problem, a new online-learning setting in which the learner observes noisy comparisons of disjoint pairs of $k$-sized teams from a universe of $n$ players. The goal of the learner is to minimize the number of duels required to identify, with high probability, a Condorcet winning team, i.e., a team which wins against any other disjoint team (with probability at least… ▽ More

    Submitted 6 July, 2021; originally announced July 2021.

  20. arXiv:2106.15207  [pdf, ps, other

    cs.LG math.OC stat.ML

    Optimal Rates for Random Order Online Optimization

    Authors: Uri Sherman, Tomer Koren, Yishay Mansour

    Abstract: We study online convex optimization in the random order model, recently proposed by \citet{garber2020online}, where the loss functions may be chosen by an adversary, but are then presented to the online algorithm in a uniformly random order. Focusing on the scenario where the cumulative loss function is (strongly) convex, yet individual loss functions are smooth but might be non-convex, we give al… ▽ More

    Submitted 29 June, 2021; originally announced June 2021.

  21. arXiv:2102.00490  [pdf, ps, other

    cs.LG stat.ML

    Online Markov Decision Processes with Aggregate Bandit Feedback

    Authors: Alon Cohen, Haim Kaplan, Tomer Koren, Yishay Mansour

    Abstract: We study a novel variant of online finite-horizon Markov Decision Processes with adversarially changing loss functions and initially unknown dynamics. In each episode, the learner suffers the loss accumulated along the trajectory realized by the policy chosen for the episode, and observes aggregate bandit feedback: the trajectory is revealed along with the cumulative loss suffered, rather than the… ▽ More

    Submitted 31 January, 2021; originally announced February 2021.

  22. arXiv:2010.14563  [pdf, ps, other

    cs.LG stat.ML

    Adversarial Dueling Bandits

    Authors: Aadirupa Saha, Tomer Koren, Yishay Mansour

    Abstract: We introduce the problem of regret minimization in Adversarial Dueling Bandits. As in classic Dueling Bandits, the learner has to repeatedly choose a pair of items and observe only a relative binary `win-loss' feedback for this pair, but here this feedback is generated from an arbitrary preference matrix, possibly chosen adversarially. Our main result is an algorithm whose $T$-round regret compare… ▽ More

    Submitted 27 October, 2020; originally announced October 2020.

    Comments: 26 pages

  23. arXiv:2010.00917  [pdf, ps, other

    cs.LG cs.DS stat.ML

    The Sparse Vector Technique, Revisited

    Authors: Haim Kaplan, Yishay Mansour, Uri Stemmer

    Abstract: We revisit one of the most basic and widely applicable techniques in the literature of differential privacy - the sparse vector technique [Dwork et al., STOC 2009]. This simple algorithm privately tests whether the value of a given query on a database is close to what we expect it to be. It allows to ask an unbounded number of queries as long as the answer is close to what we expect, and halts fol… ▽ More

    Submitted 16 November, 2020; v1 submitted 2 October, 2020; originally announced October 2020.

  24. arXiv:2009.05986  [pdf, other

    cs.LG stat.ML

    Oracle-Efficient Regret Minimization in Factored MDPs with Unknown Structure

    Authors: Aviv Rosenberg, Yishay Mansour

    Abstract: We study regret minimization in non-episodic factored Markov decision processes (FMDPs), where all existing algorithms make the strong assumption that the factored structure of the FMDP is known to the learner in advance. In this paper, we provide the first algorithm that learns the structure of the FMDP while minimizing the regret. Our algorithm is based on the optimism in face of uncertainty pri… ▽ More

    Submitted 11 October, 2021; v1 submitted 13 September, 2020; originally announced September 2020.

    Comments: NeurIPS 2021

  25. arXiv:2008.09490  [pdf, other

    cs.LG stat.ML

    Beyond Individual and Group Fairness

    Authors: Pranjal Awasthi, Corinna Cortes, Yishay Mansour, Mehryar Mohri

    Abstract: We present a new data-driven model of fairness that, unlike existing static definitions of individual or group fairness is guided by the unfairness complaints received by the system. Our model supports multiple fairness criteria and takes into account their potential incompatibilities. We consider both a stochastic and an adversarial setting of our model. In the stochastic setting, we show that ou… ▽ More

    Submitted 21 August, 2020; originally announced August 2020.

  26. arXiv:2007.09762  [pdf, other

    cs.LG stat.ML

    A Theory of Multiple-Source Adaptation with Limited Target Labeled Data

    Authors: Yishay Mansour, Mehryar Mohri, Jae Ro, Ananda Theertha Suresh, Ke Wu

    Abstract: We present a theoretical and algorithmic study of the multiple-source domain adaptation problem in the common scenario where the learner has access only to a limited amount of labeled target data, but where the learner has at disposal a large amount of labeled data from multiple source domains. We show that a new family of algorithms based on model selection ideas benefits from very favorable guar… ▽ More

    Submitted 29 October, 2020; v1 submitted 19 July, 2020; originally announced July 2020.

    Comments: 20 pages

  27. arXiv:2006.11561  [pdf, ps, other

    cs.LG stat.ML

    Stochastic Shortest Path with Adversarially Changing Costs

    Authors: Aviv Rosenberg, Yishay Mansour

    Abstract: Stochastic shortest path (SSP) is a well-known problem in planning and control, in which an agent has to reach a goal state in minimum total expected cost. In this paper we present the adversarial SSP model that also accounts for adversarial changes in the costs over time, while the underlying transition function remains unchanged. Formally, an agent interacts with an SSP environment for $K$ episo… ▽ More

    Submitted 5 April, 2022; v1 submitted 20 June, 2020; originally announced June 2020.

  28. arXiv:2005.03789  [pdf, other

    cs.LG cs.AI stat.ML

    Reinforcement Learning with Feedback Graphs

    Authors: Christoph Dann, Yishay Mansour, Mehryar Mohri, Ayush Sekhari, Karthik Sridharan

    Abstract: We study episodic reinforcement learning in Markov decision processes when the agent receives additional feedback per step in the form of several transition observations. Such additional observations are available in a range of tasks through extended sensors or prior knowledge about the environment (e.g., when certain actions yield similar outcome). We formalize this setting using a feedback graph… ▽ More

    Submitted 7 May, 2020; originally announced May 2020.

  29. arXiv:2005.01757  [pdf, ps, other

    cs.LG cs.DS stat.ML

    Sample Complexity of Uniform Convergence for Multicalibration

    Authors: Eliran Shabat, Lee Cohen, Yishay Mansour

    Abstract: There is a growing interest in societal concerns in machine learning systems, especially in fairness. Multicalibration gives a comprehensive methodology to address group fairness. In this work, we address the multicalibration error and decouple it from the prediction error. The importance of decoupling the fairness metric (multicalibration) and the accuracy (prediction error) is due to the inheren… ▽ More

    Submitted 7 June, 2021; v1 submitted 4 May, 2020; originally announced May 2020.

    Comments: NeurIPS 2020

    MSC Class: 68Q32 ACM Class: I.2.6

  30. arXiv:2004.07839  [pdf, ps, other

    cs.LG cs.CR cs.DS stat.ML

    Private Learning of Halfspaces: Simplifying the Construction and Reducing the Sample Complexity

    Authors: Haim Kaplan, Yishay Mansour, Uri Stemmer, Eliad Tsfadia

    Abstract: We present a differentially private learner for halfspaces over a finite grid $G$ in $\mathbb{R}^d$ with sample complexity $\approx d^{2.5}\cdot 2^{\log^*|G|}$, which improves the state-of-the-art result of [Beimel et al., COLT 2019] by a $d^2$ factor. The building block for our learner is a new differentially private algorithm for approximately solving the linear feasibility problem: Given a feas… ▽ More

    Submitted 3 November, 2020; v1 submitted 16 April, 2020; originally announced April 2020.

    Comments: Accepted to NeurIPS 2020. In this version we added a new section about our new method for privately optimizing high-dimensional functions. arXiv admin note: text overlap with arXiv:1902.10731

  31. arXiv:2002.10619  [pdf, other

    cs.LG stat.ML

    Three Approaches for Personalization with Applications to Federated Learning

    Authors: Yishay Mansour, Mehryar Mohri, Jae Ro, Ananda Theertha Suresh

    Abstract: The standard objective in machine learning is to train a single model for all users. However, in many learning scenarios, such as cloud computing and federated learning, it is possible to learn a personalized model per user. In this work, we present a systematic learning-theoretic study of personalization. We propose and analyze three approaches: user clustering, data interpolation, and model inte… ▽ More

    Submitted 19 July, 2020; v1 submitted 24 February, 2020; originally announced February 2020.

    Comments: 24 pages

  32. arXiv:2002.10286  [pdf, other

    cs.LG stat.ML

    Prediction with Corrupted Expert Advice

    Authors: Idan Amir, Idan Attias, Tomer Koren, Roi Livni, Yishay Mansour

    Abstract: We revisit the fundamental problem of prediction with expert advice, in a setting where the environment is benign and generates losses stochastically, but the feedback observed by the learner is subject to a moderate adversarial corruption. We prove that a variant of the classical Multiplicative Weights algorithm with decreasing step sizes achieves constant regret in this setting and performs opti… ▽ More

    Submitted 20 October, 2020; v1 submitted 24 February, 2020; originally announced February 2020.

    Comments: NeurIPS 2020 Camera Ready

    Journal ref: Conference on Neural Information Processing Systems 2020

  33. arXiv:2002.09869  [pdf, ps, other

    cs.LG stat.ML

    Near-optimal Regret Bounds for Stochastic Shortest Path

    Authors: Alon Cohen, Haim Kaplan, Yishay Mansour, Aviv Rosenberg

    Abstract: Stochastic shortest path (SSP) is a well-known problem in planning and control, in which an agent has to reach a goal state in minimum total expected cost. In the learning formulation of the problem, the agent is unaware of the environment dynamics (i.e., the transition function) and has to repeatedly play for a given number of episodes while reasoning about the problem's optimal solution. Unlike… ▽ More

    Submitted 23 February, 2020; originally announced February 2020.

  34. arXiv:1911.01679  [pdf, other

    cs.LG stat.ML

    Apprenticeship Learning via Frank-Wolfe

    Authors: Tom Zahavy, Alon Cohen, Haim Kaplan, Yishay Mansour

    Abstract: We consider the applications of the Frank-Wolfe (FW) algorithm for Apprenticeship Learning (AL). In this setting, we are given a Markov Decision Process (MDP) without an explicit reward function. Instead, we observe an expert that acts according to some policy, and the goal is to find a policy whose feature expectations are closest to those of the expert policy. We formulate this problem as findin… ▽ More

    Submitted 20 November, 2019; v1 submitted 5 November, 2019; originally announced November 2019.

  35. arXiv:1907.03346  [pdf, ps, other

    cs.LG stat.ML

    Individual Regret in Cooperative Nonstochastic Multi-Armed Bandits

    Authors: Yogev Bar-On, Yishay Mansour

    Abstract: We study agents communicating over an underlying network by exchanging messages, in order to optimize their individual regret in a common nonstochastic multi-armed bandit problem. We derive regret minimization algorithms that guarantee for each agent $v$ an individual expected regret of $\widetilde{O}\left(\sqrt{\left(1+\frac{K}{\left|\mathcal{N}\left(v\right)\right|}\right)T}\right)$, where $T$ i… ▽ More

    Submitted 16 November, 2019; v1 submitted 7 July, 2019; originally announced July 2019.

    Comments: To appear in Proc. Neural Information Processing Systems (NeurIPS), 2019

  36. arXiv:1906.09059  [pdf, ps, other

    cs.LG stat.ML

    Thompson Sampling for Adversarial Bit Prediction

    Authors: Yuval Lewi, Haim Kaplan, Yishay Mansour

    Abstract: We study the Thompson sampling algorithm in an adversarial setting, specifically, for adversarial bit prediction. We characterize the bit sequences with the smallest and largest expected regret. Among sequences of length $T$ with $k < \frac{T}{2}$ zeros, the sequences of largest regret consist of alternating zeros and ones followed by the remaining ones, and the sequence of smallest regret consist… ▽ More

    Submitted 28 December, 2019; v1 submitted 21 June, 2019; originally announced June 2019.

  37. arXiv:1906.00264  [pdf, ps, other

    cs.LG stat.ML

    Graph-based Discriminators: Sample Complexity and Expressiveness

    Authors: Roi Livni, Yishay Mansour

    Abstract: A basic question in learning theory is to identify if two distributions are identical when we have access only to examples sampled from the distributions. This basic task is considered, for example, in the context of Generative Adversarial Networks (GANs), where a discriminator is trained to distinguish between a real-life distribution and a synthetic distribution. % Classically, we use a hypothes… ▽ More

    Submitted 1 June, 2019; originally announced June 2019.

  38. arXiv:1905.12624  [pdf, other

    cs.LG stat.ML

    Top-k Combinatorial Bandits with Full-Bandit Feedback

    Authors: Idan Rejwan, Yishay Mansour

    Abstract: Top-k Combinatorial Bandits generalize multi-armed bandits, where at each round any subset of $k$ out of $n$ arms may be chosen and the sum of the rewards is gained. We address the full-bandit feedback, in which the agent observes only the sum of rewards, in contrast to the semi-bandit feedback, in which the agent observes also the individual arms' rewards. We present the Combinatorial Successive… ▽ More

    Submitted 4 December, 2019; v1 submitted 28 May, 2019; originally announced May 2019.

  39. arXiv:1905.11797  [pdf, ps, other

    cs.LG stat.ML

    ROI Maximization in Stochastic Online Decision-Making

    Authors: Nicolò Cesa-Bianchi, Tommaso Cesari, Yishay Mansour, Vianney Perchet

    Abstract: We introduce a novel theoretical framework for Return On Investment (ROI) maximization in repeated decision-making. Our setting is motivated by the use case of companies that regularly receive proposals for technological innovations and want to quickly decide whether they are worth implementing. We design an algorithm for learning ROI-maximizing decision-making policies over a sequence of innovati… ▽ More

    Submitted 22 December, 2021; v1 submitted 28 May, 2019; originally announced May 2019.

  40. arXiv:1905.11361  [pdf, ps, other

    cs.LG cs.CY stat.ML

    Efficient candidate screening under multiple tests and implications for fairness

    Authors: Lee Cohen, Zachary C. Lipton, Yishay Mansour

    Abstract: When recruiting job candidates, employers rarely observe their underlying skill level directly. Instead, they must administer a series of interviews and/or collate other noisy signals in order to estimate the worker's skill. Traditional economics papers address screening models where employers access worker skill via a single noisy signal. In this paper, we extend this theoretical analysis to a mu… ▽ More

    Submitted 27 May, 2019; originally announced May 2019.

  41. arXiv:1905.09704  [pdf, other

    cs.LG stat.ML

    Unknown mixing times in apprenticeship and reinforcement learning

    Authors: Tom Zahavy, Alon Cohen, Haim Kaplan, Yishay Mansour

    Abstract: We derive and analyze learning algorithms for apprenticeship learning, policy evaluation, and policy gradient for average reward criteria. Existing algorithms explicitly require an upper bound on the mixing time. In contrast, we build on ideas from Markov chain theory and derive sampling algorithms that do not require such an upper bound. For these algorithms, we provide theoretical bounds on thei… ▽ More

    Submitted 20 June, 2020; v1 submitted 23 May, 2019; originally announced May 2019.

  42. arXiv:1905.07773  [pdf, ps, other

    cs.LG cs.AI stat.ML

    Online Convex Optimization in Adversarial Markov Decision Processes

    Authors: Aviv Rosenberg, Yishay Mansour

    Abstract: We consider online learning in episodic loop-free Markov decision processes (MDPs), where the loss function can change arbitrarily between episodes, and the transition function is not known to the learner. We show $\tilde{O}(L|X|\sqrt{|A|T})$ regret bound, where $T$ is the number of episodes, $X$ is the state space, $A$ is the action space, and $L$ is the length of each episode. Our online algorit… ▽ More

    Submitted 19 May, 2019; originally announced May 2019.

  43. arXiv:1904.03602  [pdf, ps, other

    cs.LG cs.DS stat.ML

    Competitive ratio versus regret minimization: achieving the best of both worlds

    Authors: Amit Daniely, Yishay Mansour

    Abstract: We consider online algorithms under both the competitive ratio criteria and the regret minimization one. Our main goal is to build a unified methodology that would be able to guarantee both criteria simultaneously. For a general class of online algorithms, namely any Metrical Task System (MTS), we show that one can simultaneously guarantee the best known competitive ratio and a natural regret bo… ▽ More

    Submitted 7 April, 2019; originally announced April 2019.

  44. arXiv:1902.06223  [pdf, ps, other

    cs.LG stat.ML

    Learning Linear-Quadratic Regulators Efficiently with only $\sqrt{T}$ Regret

    Authors: Alon Cohen, Tomer Koren, Yishay Mansour

    Abstract: We present the first computationally-efficient algorithm with $\widetilde O(\sqrt{T})$ regret for learning in Linear Quadratic Control systems with unknown dynamics. By that, we resolve an open question of Abbasi-Yadkori and Szepesvári (2011) and Dean, Mania, Matni, Recht, and Tu (2018).

    Submitted 23 February, 2019; v1 submitted 17 February, 2019; originally announced February 2019.

  45. arXiv:1902.05017  [pdf, ps, other

    cs.LG stat.ML

    Differentially Private Learning of Geometric Concepts

    Authors: Haim Kaplan, Yishay Mansour, Yossi Matias, Uri Stemmer

    Abstract: We present differentially private efficient algorithms for learning union of polygons in the plane (which are not necessarily convex). Our algorithms achieve $(α,β)$-PAC learning and $(ε,δ)$-differential privacy using a sample of size $\tilde{O}\left(\frac{1}{αε}k\log d\right)$, where the domain is $[d]\times[d]$ and $k$ is the number of edges in the union of polygons.

    Submitted 13 February, 2019; originally announced February 2019.

  46. arXiv:1902.04741  [pdf, other

    cs.LG cs.DS stat.ML

    Learning to Screen

    Authors: Alon Cohen, Avinatan Hassidim, Haim Kaplan, Yishay Mansour, Shay Moran

    Abstract: Imagine a large firm with multiple departments that plans a large recruitment. Candidates arrive one-by-one, and for each candidate the firm decides, based on her data (CV, skills, experience, etc), whether to summon her for an interview. The firm wants to recruit the best candidates while minimizing the number of interviews. We model such scenarios as an assignment problem between items (candidat… ▽ More

    Submitted 31 May, 2019; v1 submitted 13 February, 2019; originally announced February 2019.

    Comments: 15 pages, 1 figure. Extended the model and the results, changed the title, and added a new lower bound

  47. arXiv:1810.09346  [pdf, ps, other

    cs.LG stat.ML

    Adversarial Online Learning with noise

    Authors: Alon Resler, Yishay Mansour

    Abstract: We present and study models of adversarial online learning where the feedback observed by the learner is noisy, and the feedback is either full information feedback or bandit feedback. Specifically, we consider binary losses xored with the noise, which is a Bernoulli random variable. We consider both a constant noise rate and a variable noise rate. Our main results are tight regret bounds for lear… ▽ More

    Submitted 4 November, 2018; v1 submitted 22 October, 2018; originally announced October 2018.

  48. arXiv:1810.02180  [pdf, other

    cs.LG stat.ML

    Improved Generalization Bounds for Adversarially Robust Learning

    Authors: Idan Attias, Aryeh Kontorovich, Yishay Mansour

    Abstract: We consider a model of robust learning in an adversarial environment. The learner gets uncorrupted training data with access to possible corruptions that may be affected by the adversary during testing. The learner's goal is to build a robust classifier, which will be tested on future adversarial examples. The adversary is limited to $k$ possible corruptions for each input. We model the learner-ad… ▽ More

    Submitted 1 July, 2022; v1 submitted 4 October, 2018; originally announced October 2018.

    Comments: JMLR camera ready

  49. arXiv:1806.07104  [pdf, other

    cs.LG stat.ML

    Online Linear Quadratic Control

    Authors: Alon Cohen, Avinatan Hassidim, Tomer Koren, Nevena Lazic, Yishay Mansour, Kunal Talwar

    Abstract: We study the problem of controlling linear time-invariant systems with known noisy dynamics and adversarially chosen quadratic losses. We present the first efficient online learning algorithms in this setting that guarantee $O(\sqrt{T})$ regret under mild assumptions, where $T$ is the time horizon. Our algorithms rely on a novel SDP relaxation for the steady-state distribution of the system. Cruci… ▽ More

    Submitted 19 June, 2018; originally announced June 2018.

  50. arXiv:1803.04674  [pdf, other

    cs.LG cs.AI stat.ML

    Hierarchical Reinforcement Learning: Approximating Optimal Discounted TSP Using Local Policies

    Authors: Tom Zahavy, Avinatan Hasidim, Haim Kaplan, Yishay Mansour

    Abstract: In this work, we provide theoretical guarantees for reward decomposition in deterministic MDPs. Reward decomposition is a special case of Hierarchical Reinforcement Learning, that allows one to learn many policies in parallel and combine them into a composite solution. Our approach builds on mapping this problem into a Reward Discounted Traveling Salesman Problem, and then deriving approximate sol… ▽ More

    Submitted 13 March, 2018; originally announced March 2018.