Skip to main content

Showing 1–30 of 30 results for author: Combes, R

Searching in archive stat. Search in all archives.
.
  1. arXiv:2502.17175  [pdf, other

    stat.ML cs.LG

    Linear Bandits on Ellipsoids: Minimax Optimal Algorithms

    Authors: Raymond Zhang, Hedi Hadiji, Richard Combes

    Abstract: We consider linear stochastic bandits where the set of actions is an ellipsoid. We provide the first known minimax optimal algorithm for this problem. We first derive a novel information-theoretic lower bound on the regret of any algorithm, which must be at least $Ω(\min(d σ\sqrt{T} + d \|θ\|_{A}, \|θ\|_{A} T))$ where $d$ is the dimension, $T$ the time horizon, $σ^2$ the noise variance, $A$ a matr… ▽ More

    Submitted 24 February, 2025; originally announced February 2025.

    Comments: 20 pages, 3 figures

  2. arXiv:2410.05441  [pdf, other

    stat.ML cs.LG

    Thompson Sampling For Combinatorial Bandits: Polynomial Regret and Mismatched Sampling Paradox

    Authors: Raymond Zhang, Richard Combes

    Abstract: We consider Thompson Sampling (TS) for linear combinatorial semi-bandits and subgaussian rewards. We propose the first known TS whose finite-time regret does not scale exponentially with the dimension of the problem. We further show the "mismatched sampling paradox": A learner who knows the rewards distributions and samples from the correct posterior distribution can perform exponentially worse th… ▽ More

    Submitted 7 October, 2024; originally announced October 2024.

    Comments: NeurIPS 2024

  3. arXiv:2104.03863  [pdf, other

    cs.LG cs.CR stat.ML

    A single gradient step finds adversarial examples on random two-layers neural networks

    Authors: Sébastien Bubeck, Yeshwanth Cherapanamjeri, Gauthier Gidel, Rémi Tachet des Combes

    Abstract: Daniely and Schacham recently showed that gradient descent finds adversarial examples on random undercomplete two-layers ReLU neural networks. The term "undercomplete" refers to the fact that their proof only holds when the number of neurons is a vanishing fraction of the ambient dimension. We extend their result to the overcomplete case, where the number of neurons is larger than the dimension (y… ▽ More

    Submitted 9 April, 2021; v1 submitted 8 April, 2021; originally announced April 2021.

    Comments: Added a comment about universal adversarial perturbations. 18 pages, 7 figures

  4. arXiv:2103.13059  [pdf, other

    stat.ML cs.LG

    Towards Optimal Algorithms for Multi-Player Bandits without Collision Sensing Information

    Authors: Wei Huang, Richard Combes, Cindy Trinh

    Abstract: We propose a novel algorithm for multi-player multi-armed bandits without collision sensing information. Our algorithm circumvents two problems shared by all state-of-the-art algorithms: it does not need as an input a lower bound on the minimal expected reward of an arm, and its performance does not scale inversely proportionally to the minimal expected reward. We prove a theoretical regret upper… ▽ More

    Submitted 6 June, 2022; v1 submitted 24 March, 2021; originally announced March 2021.

    Comments: 24 pages, COLT 2022

  5. arXiv:2102.10200  [pdf, other

    cs.LG stat.ML

    A High Performance, Low Complexity Algorithm for Multi-Player Bandits Without Collision Sensing Information

    Authors: Cindy Trinh, Richard Combes

    Abstract: Motivated by applications in cognitive radio networks, we consider the decentralized multi-player multi-armed bandit problem, without collision nor sensing information. We propose Randomized Selfish KL-UCB, an algorithm with very low computational complexity, inspired by the Selfish KL-UCB algorithm, which has been abandoned as it provably performs sub-optimally in some cases. We subject Randomize… ▽ More

    Submitted 19 February, 2021; originally announced February 2021.

    Comments: 14 pages

  6. arXiv:2102.07254  [pdf, ps, other

    stat.ML cs.LG

    Asymptotically Optimal Strategies For Combinatorial Semi-Bandits in Polynomial Time

    Authors: Thibaut Cuvelier, Richard Combes, Eric Gourdin

    Abstract: We consider combinatorial semi-bandits with uncorrelated Gaussian rewards. In this article, we propose the first method, to the best of our knowledge, that enables to compute the solution of the Graves-Lai optimization problem in polynomial time for many combinatorial structures of interest. In turn, this immediately yields the first known approach to implement asymptotically optimal algorithms in… ▽ More

    Submitted 14 February, 2021; originally announced February 2021.

    Comments: 26 pages

  7. arXiv:2102.05628  [pdf, ps, other

    stat.ML cs.LG

    On the Regularity of Attention

    Authors: James Vuckovic, Aristide Baratin, Remi Tachet des Combes

    Abstract: Attention is a powerful component of modern neural networks across a wide variety of domains. In this paper, we seek to quantify the regularity (i.e. the amount of smoothness) of the attention operation. To accomplish this goal, we propose a new mathematical framework that uses measure theory and integral operators to model attention. We show that this framework is consistent with the usual defini… ▽ More

    Submitted 10 February, 2021; originally announced February 2021.

    Comments: Conference version of arXiv:2007.02876

  8. arXiv:2102.05502  [pdf, other

    stat.ML cs.LG

    On the Suboptimality of Thompson Sampling in High Dimensions

    Authors: Raymond Zhang, Richard Combes

    Abstract: In this paper we consider Thompson Sampling (TS) for combinatorial semi-bandits. We demonstrate that, perhaps surprisingly, TS is sub-optimal for this problem in the sense that its regret scales exponentially in the ambient dimension, and its minimax regret scales almost linearly. This phenomenon occurs under a wide variety of assumptions including both non-linear and linear reward functions, with… ▽ More

    Submitted 20 October, 2021; v1 submitted 10 February, 2021; originally announced February 2021.

    Comments: Neurips 2021 - 34 pages

  9. arXiv:2009.05475  [pdf, other

    cs.LG cs.CV stat.ML

    Adversarial score matching and improved sampling for image generation

    Authors: Alexia Jolicoeur-Martineau, Rémi Piché-Taillefer, Rémi Tachet des Combes, Ioannis Mitliagkas

    Abstract: Denoising Score Matching with Annealed Langevin Sampling (DSM-ALS) has recently found success in generative modeling. The approach works by first training a neural network to estimate the score of a distribution, and then using Langevin dynamics to sample from the data distribution assumed by the score network. Despite the convincing visual quality of samples, this method appears to perform worse… ▽ More

    Submitted 10 October, 2020; v1 submitted 11 September, 2020; originally announced September 2020.

    Comments: Code at https://github.com/AlexiaJM/AdversarialConsistentScoreMatching

  10. arXiv:2007.08387  [pdf, other

    cs.LO stat.ML

    Solving Random Parity Games in Polynomial Time

    Authors: Richard Combes, Mikael Touati

    Abstract: We consider the problem of solving random parity games. We prove that parity games exibit a phase transition threshold above $d_P$, so that when the degree of the graph that defines the game has a degree $d > d_P$ then there exists a polynomial time algorithm that solves the game with high probability when the number of nodes goes to infinity. We further propose the SWCP (Self-Winning Cycles Propa… ▽ More

    Submitted 16 July, 2020; originally announced July 2020.

    Comments: 23 pages

  11. arXiv:2007.02876  [pdf, ps, other

    stat.ML cs.LG

    A Mathematical Theory of Attention

    Authors: James Vuckovic, Aristide Baratin, Remi Tachet des Combes

    Abstract: Attention is a powerful component of modern neural networks across a wide variety of domains. However, despite its ubiquity in machine learning, there is a gap in our understanding of attention from a theoretical point of view. We propose a framework to fill this gap by building a mathematically equivalent model of attention using measure theory. With this model, we are able to interpret self-atte… ▽ More

    Submitted 20 July, 2020; v1 submitted 6 July, 2020; originally announced July 2020.

  12. arXiv:2006.07217  [pdf, other

    cs.LG stat.ML

    Deep Reinforcement and InfoMax Learning

    Authors: Bogdan Mazoure, Remi Tachet des Combes, Thang Doan, Philip Bachman, R Devon Hjelm

    Abstract: We begin with the hypothesis that a model-free agent whose representations are predictive of properties of future states (beyond expected rewards) will be more capable of solving and adapting to new RL problems. To test that hypothesis, we introduce an objective based on Deep InfoMax (DIM) which trains the agent to predict the future by maximizing the mutual information between its internal repres… ▽ More

    Submitted 16 November, 2020; v1 submitted 12 June, 2020; originally announced June 2020.

    Comments: NeurIPS 2020

  13. arXiv:2002.07258  [pdf, other

    stat.ML cs.LG math.OC

    Statistically Efficient, Polynomial Time Algorithms for Combinatorial Semi Bandits

    Authors: Thibaut Cuvelier, Richard Combes, Eric Gourdin

    Abstract: We consider combinatorial semi-bandits over a set of arms ${\cal X} \subset \{0,1\}^d$ where rewards are uncorrelated across items. For this problem, the algorithm ESCB yields the smallest known regret bound $R(T) = {\cal O}\Big( {d (\ln m)^2 (\ln T) \over Δ_{\min} }\Big)$, but it has computational complexity ${\cal O}(|{\cal X}|)$ which is typically exponential in $d$, and cannot be used in large… ▽ More

    Submitted 13 January, 2021; v1 submitted 17 February, 2020; originally announced February 2020.

  14. arXiv:1912.03074  [pdf, other

    stat.ML cs.LG

    Solving Bernoulli Rank-One Bandits with Unimodal Thompson Sampling

    Authors: Cindy Trinh, Emilie Kaufmann, Claire Vernade, Richard Combes

    Abstract: Stochastic Rank-One Bandits (Katarya et al, (2017a,b)) are a simple framework for regret minimization problems over rank-one matrices of arms. The initially proposed algorithms are proved to have logarithmic regret, but do not match the existing lower bound for this problem. We close this gap by first proving that rank-one bandits are a particular instance of unimodal bandits, and then providing a… ▽ More

    Submitted 6 December, 2019; originally announced December 2019.

  15. arXiv:1911.05873  [pdf, ps, other

    cs.LG stat.ML

    A Reduction from Reinforcement Learning to No-Regret Online Learning

    Authors: Ching-An Cheng, Remi Tachet des Combes, Byron Boots, Geoff Gordon

    Abstract: We present a reduction from reinforcement learning (RL) to no-regret online learning based on the saddle-point formulation of RL, by which "any" online algorithm with sublinear regret can generate policies with provable performance guarantees. This new perspective decouples the RL problem into two parts: regret minimization and function approximation. The first part admits a standard online-learni… ▽ More

    Submitted 1 January, 2020; v1 submitted 13 November, 2019; originally announced November 2019.

  16. arXiv:1909.05236  [pdf, other

    cs.LG cs.AI stat.ML

    Safe Policy Improvement with an Estimated Baseline Policy

    Authors: Thiago D. Simão, Romain Laroche, Rémi Tachet des Combes

    Abstract: Previous work has shown the unreliability of existing algorithms in the batch Reinforcement Learning setting, and proposed the theoretically-grounded Safe Policy Improvement with Baseline Bootstrapping (SPIBB) fix: reproduce the baseline policy in the uncertain state-action pairs, in order to control the variance on the trained policy performance. However, in many real-world applications such as d… ▽ More

    Submitted 28 December, 2020; v1 submitted 11 September, 2019; originally announced September 2019.

    Comments: Published at AAMAS 2020

  17. arXiv:1907.05079  [pdf, other

    cs.LG cs.AI stat.ML

    Safe Policy Improvement with Soft Baseline Bootstrapping

    Authors: Kimia Nadjahi, Romain Laroche, Rémi Tachet des Combes

    Abstract: Batch Reinforcement Learning (Batch RL) consists in training a policy using trajectories collected with another policy, called the behavioural policy. Safe policy improvement (SPI) provides guarantees with high probability that the trained policy performs better than the behavioural policy, also called baseline in this setting. Previous work shows that the SPI objective improves mean performance a… ▽ More

    Submitted 11 July, 2019; originally announced July 2019.

    Comments: Accepted paper at ECML-PKDD2019

  18. arXiv:1901.09453  [pdf, other

    cs.LG cs.AI stat.ML

    On Learning Invariant Representation for Domain Adaptation

    Authors: Han Zhao, Remi Tachet des Combes, Kun Zhang, Geoffrey J. Gordon

    Abstract: Due to the ability of deep neural nets to learn rich representations, recent advances in unsupervised domain adaptation have focused on learning domain-invariant features that achieve a small error on the source domain. The hope is that the learnt representation, together with the hypothesis learnt from the source domain, can generalize to the target domain. In this paper, we first construct a sim… ▽ More

    Submitted 30 May, 2019; v1 submitted 27 January, 2019; originally announced January 2019.

    Comments: Compared with the last version, the current one adds a new corollary for the case of different feature transformations (encoders) on source/target domains. Fix a typo in Fig. 1

  19. arXiv:1812.05159  [pdf, other

    cs.LG stat.ML

    An Empirical Study of Example Forgetting during Deep Neural Network Learning

    Authors: Mariya Toneva, Alessandro Sordoni, Remi Tachet des Combes, Adam Trischler, Yoshua Bengio, Geoffrey J. Gordon

    Abstract: Inspired by the phenomenon of catastrophic forgetting, we investigate the learning dynamics of neural networks as they train on single classification tasks. Our goal is to understand whether a related phenomenon occurs when data does not undergo a clear distributional shift. We define a `forgetting event' to have occurred when an individual training example transitions from being classified correc… ▽ More

    Submitted 15 November, 2019; v1 submitted 12 December, 2018; originally announced December 2018.

    Comments: ICLR 2019

  20. arXiv:1806.06047  [pdf, other

    stat.ML cs.LG

    Computationally Efficient Estimation of the Spectral Gap of a Markov Chain

    Authors: Richard Combes, Mikael Touati

    Abstract: We consider the problem of estimating from sample paths the absolute spectral gap $γ_*$ of a reversible, irreducible and aperiodic Markov chain $(X_t)_{t \in \mathbb{N}}$ over a finite state space $Ω$. We propose the ${\tt UCPI}$ (Upper Confidence Power Iteration) algorithm for this problem, a low-complexity algorithm which estimates the spectral gap in time ${\cal O}(n)$ and memory space… ▽ More

    Submitted 7 February, 2019; v1 submitted 15 June, 2018; originally announced June 2018.

    Comments: 32 pages

  21. arXiv:1806.04047  [pdf, other

    stat.ML cs.LG

    High Dimensional Data Enrichment: Interpretable, Fast, and Data-Efficient

    Authors: Amir Asiaee, Samet Oymak, Kevin R. Coombes, Arindam Banerjee

    Abstract: We consider the problem of multi-task learning in the high dimensional setting. In particular, we introduce an estimator and investigate its statistical and computational properties for the problem of multiple connected linear regressions known as Data Enrichment/Sharing. The between-tasks connections are captured by a cross-tasks \emph{common parameter}, which gets refined by per-task \emph{indiv… ▽ More

    Submitted 30 June, 2023; v1 submitted 11 June, 2018; originally announced June 2018.

  22. arXiv:1712.06924  [pdf, other

    cs.LG cs.AI stat.ML

    Safe Policy Improvement with Baseline Bootstrapping

    Authors: Romain Laroche, Paul Trichelair, Rémi Tachet des Combes

    Abstract: This paper considers Safe Policy Improvement (SPI) in Batch Reinforcement Learning (Batch RL): from a fixed dataset and without direct access to the true environment, train a policy that is guaranteed to perform at least as well as the baseline policy used to collect the data. Our approach, called SPI with Baseline Bootstrapping (SPIBB), is inspired by the knows-what-it-knows paradigm: it bootstra… ▽ More

    Submitted 7 June, 2019; v1 submitted 19 December, 2017; originally announced December 2017.

    Comments: accepted as a long oral at ICML2019

  23. arXiv:1711.00400  [pdf, other

    stat.ML cs.AI cs.LG math.OC

    Minimal Exploration in Structured Stochastic Bandits

    Authors: Richard Combes, Stefan Magureanu, Alexandre Proutiere

    Abstract: This paper introduces and addresses a wide class of stochastic bandit problems where the function mapping the arm to the corresponding reward exhibits some known structural properties. Most existing structures (e.g. linear, Lipschitz, unimodal, combinatorial, dueling, ...) are covered by our framework. We derive an asymptotic instance-specific regret lower bound for these problems, and develop OSS… ▽ More

    Submitted 1 November, 2017; originally announced November 2017.

    Comments: 13 pages, NIPS 2017

  24. arXiv:1703.01347  [pdf, other

    cs.AI cs.LG stat.ML

    Contextual Linear Bandits under Noisy Features: Towards Bayesian Oracles

    Authors: Jung-hun Kim, Se-Young Yun, Minchan Jeong, Jun Hyun Nam, Jinwoo Shin, Richard Combes

    Abstract: We study contextual linear bandit problems under feature uncertainty, where the features are noisy and have missing entries. To address the challenges posed by this noise, we analyze Bayesian oracles given the observed noisy features. Our Bayesian analysis reveals that the optimal hypothesis can significantly deviate from the underlying realizability function, depending on the noise characteristic… ▽ More

    Submitted 10 October, 2024; v1 submitted 3 March, 2017; originally announced March 2017.

    Comments: AISTATS2023; minor corrections

  25. arXiv:1606.00226  [pdf, other

    stat.ML cs.HC cs.LG cs.SI

    A Minimax Optimal Algorithm for Crowdsourcing

    Authors: Thomas Bonald, Richard Combes

    Abstract: We consider the problem of accurately estimating the reliability of workers based on noisy labels they provide, which is a fundamental question in crowdsourcing. We propose a novel lower bound on the minimax estimation error which applies to any estimation procedure. We further propose Triangular Estimation (TE), an algorithm for estimating the reliability of workers. TE has low complexity, may be… ▽ More

    Submitted 25 October, 2017; v1 submitted 1 June, 2016; originally announced June 2016.

    Comments: 19 pages, NIPS 2017

  26. arXiv:1602.07107  [pdf, other

    stat.ML cs.LG

    A Streaming Algorithm for Crowdsourced Data Classification

    Authors: Thomas Bonald, Richard Combes

    Abstract: We propose a streaming algorithm for the binary classification of data based on crowdsourcing. The algorithm learns the competence of each labeller by comparing her labels to those of other labellers on the same tasks and uses this information to minimize the prediction error rate on each task. We provide performance guarantees of our algorithm for a fixed population of independent labellers. In p… ▽ More

    Submitted 23 February, 2016; originally announced February 2016.

    Comments: 23 pages

  27. arXiv:1502.03475  [pdf, other

    cs.LG math.OC stat.ML

    Combinatorial Bandits Revisited

    Authors: Richard Combes, M. Sadegh Talebi, Alexandre Proutiere, Marc Lelarge

    Abstract: This paper investigates stochastic and adversarial combinatorial multi-armed bandit problems. In the stochastic setting under semi-bandit feedback, we derive a problem-specific regret lower bound, and discuss its scaling with the dimension of the decision space. We propose ESCB, an algorithm that efficiently exploits the structure of the problem and provide a finite-time analysis of its regret. ES… ▽ More

    Submitted 5 November, 2015; v1 submitted 11 February, 2015; originally announced February 2015.

    Comments: 30 pages, Advances in Neural Information Processing Systems 28 (NIPS 2015)

  28. arXiv:1405.5096  [pdf, ps, other

    cs.LG stat.ML

    Unimodal Bandits: Regret Lower Bounds and Optimal Algorithms

    Authors: Richard Combes, Alexandre Proutiere

    Abstract: We consider stochastic multi-armed bandits where the expected reward is a unimodal function over partially ordered arms. This important class of problems has been recently investigated in (Cope 2009, Yu 2011). The set of arms is either discrete, in which case arms correspond to the vertices of a finite graph whose structure represents similarity in rewards, or continuous, in which case arms belong… ▽ More

    Submitted 20 May, 2014; originally announced May 2014.

    Comments: ICML 2014 (technical report). arXiv admin note: text overlap with arXiv:1307.7309

  29. Bayesian sparse graphical models for classification with application to protein expression data

    Authors: Veerabhadran Baladandayuthapani, Rajesh Talluri, Yuan Ji, Kevin R. Coombes, Yiling Lu, Bryan T. Hennessy, Michael A. Davies, Bani K. Mallick

    Abstract: Reverse-phase protein array (RPPA) analysis is a powerful, relatively new platform that allows for high-throughput, quantitative analysis of protein networks. One of the challenges that currently limit the potential of this technology is the lack of methods that allow for accurate data modeling and identification of related networks and samples. Such models may improve the accuracy of biological s… ▽ More

    Submitted 21 November, 2014; v1 submitted 29 March, 2014; originally announced March 2014.

    Comments: Published in at http://dx.doi.org/10.1214/14-AOAS722 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOAS-AOAS722

    Journal ref: Annals of Applied Statistics 2014, Vol. 8, No. 3, 1443-1468

  30. Deriving chemosensitivity from cell lines: Forensic bioinformatics and reproducible research in high-throughput biology

    Authors: Keith A. Baggerly, Kevin R. Coombes

    Abstract: High-throughput biological assays such as microarrays let us ask very detailed questions about how diseases operate, and promise to let us personalize therapy. Data processing, however, is often not described well enough to allow for exact reproduction of the results, leading to exercises in "forensic bioinformatics" where aspects of raw data and reported results are used to infer what methods mus… ▽ More

    Submitted 6 October, 2010; originally announced October 2010.

    Comments: Published in at http://dx.doi.org/10.1214/09-AOAS291 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOAS-AOAS291

    Journal ref: Annals of Applied Statistics 2009, Vol. 3, No. 4, 1309-1334