Skip to main content

Showing 1–9 of 9 results for author: Gouverneur, A

.
  1. arXiv:2502.11953  [pdf, ps, other

    stat.ML cs.LG

    Refined PAC-Bayes Bounds for Offline Bandits

    Authors: Amaury Gouverneur, Tobias J. Oechtering, Mikael Skoglund

    Abstract: In this paper, we present refined probabilistic bounds on empirical reward estimates for off-policy learning in bandit problems. We build on the PAC-Bayesian bounds from Seldin et al. (2010) and improve on their results using a new parameter optimization approach introduced by Rodríguez et al. (2024). This technique is based on a discretization of the space of possible events to optimize the "in p… ▽ More

    Submitted 17 February, 2025; originally announced February 2025.

    Comments: 6 pages

  2. arXiv:2502.02140  [pdf, ps, other

    stat.ML cs.LG

    An Information-Theoretic Analysis of Thompson Sampling with Infinite Action Spaces

    Authors: Amaury Gouverneur, Borja Rodriguez Gálvez, Tobias Oechtering, Mikael Skoglund

    Abstract: This paper studies the Bayesian regret of the Thompson Sampling algorithm for bandit problems, building on the information-theoretic framework introduced by Russo and Van Roy (2015). Specifically, it extends the rate-distortion analysis of Dong and Van Roy (2018), which provides near-optimal bounds for linear bandits. A limitation of these results is the assumption of a finite action space. We add… ▽ More

    Submitted 4 February, 2025; originally announced February 2025.

    Comments: 5 pages, accepted to ICASSP

  3. arXiv:2412.02861  [pdf, ps, other

    stat.ML cs.LG

    An Information-Theoretic Analysis of Thompson Sampling for Logistic Bandits

    Authors: Amaury Gouverneur, Borja Rodríguez-Gálvez, Tobias J. Oechtering, Mikael Skoglund

    Abstract: We study the performance of the Thompson Sampling algorithm for logistic bandit problems. In this setting, an agent receives binary rewards with probabilities determined by a logistic function, $\exp(β\langle a, θ\rangle)/(1+\exp(β\langle a, θ\rangle))$, with slope parameter $β>0$, and where both the action $a\in \mathcal{A}$ and parameter $θ\in \mathcal{O}$ lie within the $d$-dimensional unit bal… ▽ More

    Submitted 20 February, 2025; v1 submitted 3 December, 2024; originally announced December 2024.

    Comments: 21 pages, under review

  4. arXiv:2410.16013  [pdf, ps, other

    cs.LG cs.IT

    Information-Theoretic Minimax Regret Bounds for Reinforcement Learning based on Duality

    Authors: Raghav Bongole, Amaury Gouverneur, Borja Rodríguez-Gálvez, Tobias J. Oechtering, Mikael Skoglund

    Abstract: We study agents acting in an unknown environment where the agent's goal is to find a robust policy. We consider robust policies as policies that achieve high cumulative rewards for all possible environments. To this end, we consider agents minimizing the maximum regret over different environment parameters, leading to the study of minimax regret. This research focuses on deriving information-theor… ▽ More

    Submitted 21 October, 2024; originally announced October 2024.

  5. arXiv:2403.03361  [pdf, ps, other

    stat.ML cs.LG

    Chained Information-Theoretic bounds and Tight Regret Rate for Linear Bandit Problems

    Authors: Amaury Gouverneur, Borja Rodríguez-Gálvez, Tobias J. Oechtering, Mikael Skoglund

    Abstract: This paper studies the Bayesian regret of a variant of the Thompson-Sampling algorithm for bandit problems. It builds upon the information-theoretic framework of [Russo and Van Roy, 2015] and, more specifically, on the rate-distortion analysis from [Dong and Van Roy, 2020], where they proved a bound with regret rate of $O(d\sqrt{T \log(T)})$ for the $d$-dimensional linear bandit setting. We focus… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

    Comments: 15 pages: 8 of main text and 7 of appendices

  6. arXiv:2304.13593  [pdf, ps, other

    stat.ML cs.LG

    Thompson Sampling Regret Bounds for Contextual Bandits with sub-Gaussian rewards

    Authors: Amaury Gouverneur, Borja Rodríguez-Gálvez, Tobias J. Oechtering, Mikael Skoglund

    Abstract: In this work, we study the performance of the Thompson Sampling algorithm for Contextual Bandit problems based on the framework introduced by Neu et al. and their concept of lifted information ratio. First, we prove a comprehensive bound on the Thompson Sampling expected cumulative regret that depends on the mutual information of the environment parameters and the history. Then, we introduce new b… ▽ More

    Submitted 26 April, 2023; originally announced April 2023.

    Comments: 8 pages: 5 of the main text, 1 of references, and 2 of appendices. Accepted to ISIT 2023

  7. arXiv:2207.08735  [pdf, ps, other

    cs.LG stat.ML

    An Information-Theoretic Analysis of Bayesian Reinforcement Learning

    Authors: Amaury Gouverneur, Borja Rodríguez-Gálvez, Tobias J. Oechtering, Mikael Skoglund

    Abstract: Building on the framework introduced by Xu and Raginksy [1] for supervised learning problems, we study the best achievable performance for model-based Bayesian reinforcement learning problems. With this purpose, we define minimum Bayesian regret (MBR) as the difference between the maximum expected cumulative reward obtainable either by learning from the collected data or by knowing the environment… ▽ More

    Submitted 18 July, 2022; originally announced July 2022.

    Comments: 10 pages: 6 of the main text, 1 of references, and 3 of appendices

  8. Optimal Intermittent Particle Filter

    Authors: Antoine Aspeel, Amaury Gouverneur, Raphaël M. Jungers, Benoit Macq

    Abstract: The problem of the optimal allocation (in the expected mean square error sense) of a measurement budget for particle filtering is addressed. We propose three different optimal intermittent filters, whose optimality criteria depend on the information available at the time of decision making. For the first, the stochastic program filter, the measurement times are given by a policy that determines wh… ▽ More

    Submitted 13 April, 2022; originally announced April 2022.

    Comments: 13 pages, 7 figures, 3 Tables

  9. arXiv:2005.08557  [pdf, other

    eess.SY

    Optimal measurement budget allocation for particle filtering

    Authors: Antoine Aspeel, Amaury Gouverneur, Raphaël M. Jungers, Benoît Macq

    Abstract: Particle filtering is a powerful tool for target tracking. When the budget for observations is restricted, it is necessary to reduce the measurements to a limited amount of samples carefully selected. A discrete stochastic nonlinear dynamical system is studied over a finite time horizon. The problem of selecting the optimal measurement times for particle filtering is formalized as a combinatorial… ▽ More

    Submitted 18 May, 2020; originally announced May 2020.

    Comments: 5 pages, 4 figues, conference paper