Skip to main content

Showing 1–8 of 8 results for author: Garcelon, E

Searching in archive stat. Search in all archives.
.
  1. arXiv:2312.15458  [pdf

    stat.ML cs.LG

    Conservative Exploration for Policy Optimization via Off-Policy Policy Evaluation

    Authors: Paul Daoudi, Mathias Formoso, Othman Gaizi, Achraf Azize, Evrard Garcelon

    Abstract: A precondition for the deployment of a Reinforcement Learning agent to a real-world system is to provide guarantees on the learning process. While a learning algorithm will eventually converge to a good policy, there are no guarantees on the performance of the exploratory policies. We study the problem of conservative exploration, where the learner must at least be able to guarantee its performanc… ▽ More

    Submitted 24 December, 2023; originally announced December 2023.

  2. arXiv:2112.06517  [pdf, other

    cs.LG stat.ML

    Top $K$ Ranking for Multi-Armed Bandit with Noisy Evaluations

    Authors: Evrard Garcelon, Vashist Avadhanula, Alessandro Lazaric, Matteo Pirotta

    Abstract: We consider a multi-armed bandit setting where, at the beginning of each round, the learner receives noisy independent, and possibly biased, \emph{evaluations} of the true reward of each arm and it selects $K$ arms with the objective of accumulating as much reward as possible over $T$ rounds. Under the assumption that at each round the true reward of each arm is drawn from a fixed distribution, we… ▽ More

    Submitted 12 April, 2022; v1 submitted 13 December, 2021; originally announced December 2021.

  3. arXiv:2106.11692  [pdf, ps, other

    cs.LG stat.ML

    A Reduction-Based Framework for Conservative Bandits and Reinforcement Learning

    Authors: Yunchang Yang, Tianhao Wu, Han Zhong, Evrard Garcelon, Matteo Pirotta, Alessandro Lazaric, Liwei Wang, Simon S. Du

    Abstract: In this paper, we present a reduction-based framework for conservative bandits and RL, in which our core technique is to calculate the necessary and sufficient budget obtained from running the baseline policy. For lower bounds, we improve the existing lower bound for conservative multi-armed bandits and obtain new lower bounds for conservative linear bandits, tabular RL and low-rank MDP, through a… ▽ More

    Submitted 16 March, 2022; v1 submitted 22 June, 2021; originally announced June 2021.

  4. arXiv:2002.03839  [pdf, other

    cs.LG stat.ML

    Adversarial Attacks on Linear Contextual Bandits

    Authors: Evrard Garcelon, Baptiste Roziere, Laurent Meunier, Jean Tarbouriech, Olivier Teytaud, Alessandro Lazaric, Matteo Pirotta

    Abstract: Contextual bandit algorithms are applied in a wide range of domains, from advertising to recommender systems, from clinical trials to education. In many of these domains, malicious agents may have incentives to attack the bandit algorithm to induce it to perform a desired behavior. For instance, an unscrupulous ad publisher may try to increase their own revenue at the expense of the advertisers; a… ▽ More

    Submitted 23 October, 2020; v1 submitted 10 February, 2020; originally announced February 2020.

  5. arXiv:2002.03221  [pdf, other

    cs.LG stat.ML

    Improved Algorithms for Conservative Exploration in Bandits

    Authors: Evrard Garcelon, Mohammad Ghavamzadeh, Alessandro Lazaric, Matteo Pirotta

    Abstract: In many fields such as digital marketing, healthcare, finance, and robotics, it is common to have a well-tested and reliable baseline policy running in production (e.g., a recommender system). Nonetheless, the baseline policy is often suboptimal. In this case, it is desirable to deploy online learning algorithms (e.g., a multi-armed bandit algorithm) that interact with the system to learn a better… ▽ More

    Submitted 8 February, 2020; originally announced February 2020.

  6. arXiv:2002.03218  [pdf, other

    cs.LG stat.ML

    Conservative Exploration in Reinforcement Learning

    Authors: Evrard Garcelon, Mohammad Ghavamzadeh, Alessandro Lazaric, Matteo Pirotta

    Abstract: While learning in an unknown Markov Decision Process (MDP), an agent should trade off exploration to discover new information about the MDP, and exploitation of the current knowledge to maximize the reward. Although the agent will eventually learn a good or optimal policy, there is no guarantee on the quality of the intermediate policies. This lack of control is undesired in real-world application… ▽ More

    Submitted 15 July, 2020; v1 submitted 8 February, 2020; originally announced February 2020.

    Comments: AISTATS 2020

  7. arXiv:1912.03517  [pdf, other

    stat.ML cs.LG

    No-Regret Exploration in Goal-Oriented Reinforcement Learning

    Authors: Jean Tarbouriech, Evrard Garcelon, Michal Valko, Matteo Pirotta, Alessandro Lazaric

    Abstract: Many popular reinforcement learning problems (e.g., navigation in a maze, some Atari games, mountain car) are instances of the episodic setting under its stochastic shortest path (SSP) formulation, where an agent has to achieve a goal state while minimizing the cumulative cost. Despite the popularity of this setting, the exploration-exploitation dilemma has been sparsely studied in general SSP pro… ▽ More

    Submitted 17 August, 2020; v1 submitted 7 December, 2019; originally announced December 2019.

    Journal ref: International Conference on Machine Learning (ICML 2020)

  8. arXiv:1807.03558  [pdf, other

    cs.LG stat.ML

    Bandits with Side Observations: Bounded vs. Logarithmic Regret

    Authors: Rémy Degenne, Evrard Garcelon, Vianney Perchet

    Abstract: We consider the classical stochastic multi-armed bandit but where, from time to time and roughly with frequency $ε$, an extra observation is gathered by the agent for free. We prove that, no matter how small $ε$ is the agent can ensure a regret uniformly bounded in time. More precisely, we construct an algorithm with a regret smaller than $\sum_i \frac{\log(1/ε)}{Δ_i}$, up to multiplicative cons… ▽ More

    Submitted 10 July, 2018; originally announced July 2018.

    Comments: Conference on Uncertainty in Artificial Intelligence (UAI) 2018, 21 pages