Skip to main content

Showing 1–12 of 12 results for author: O'Donoghue, B

Searching in archive stat. Search in all archives.
.
  1. arXiv:2110.15688  [pdf, other

    stat.ML cs.LG

    Variational Bayesian Optimistic Sampling

    Authors: Brendan O'Donoghue, Tor Lattimore

    Abstract: We consider online sequential decision problems where an agent must balance exploration and exploitation. We derive a set of Bayesian `optimistic' policies which, in the stochastic multi-armed bandit case, includes the Thompson sampling policy. We provide a new analysis showing that any algorithm producing policies in the optimistic set enjoys $\tilde O(\sqrt{AT})$ Bayesian regret for a problem wi… ▽ More

    Submitted 29 October, 2021; originally announced October 2021.

  2. arXiv:2110.04629  [pdf, other

    cs.LG cs.AI stat.ML

    The Neural Testbed: Evaluating Joint Predictions

    Authors: Ian Osband, Zheng Wen, Seyed Mohammad Asghari, Vikranth Dwaracherla, Botao Hao, Morteza Ibrahimi, Dieterich Lawson, Xiuyuan Lu, Brendan O'Donoghue, Benjamin Van Roy

    Abstract: Predictive distributions quantify uncertainties ignored by point estimates. This paper introduces The Neural Testbed: an open-source benchmark for controlled and principled evaluation of agents that generate such predictions. Crucially, the testbed assesses agents not only on the quality of their marginal predictions per input, but also on their joint predictions across many inputs. We evaluate a… ▽ More

    Submitted 1 November, 2022; v1 submitted 9 October, 2021; originally announced October 2021.

  3. arXiv:2106.00669  [pdf, other

    cs.AI cs.LG stat.ML

    Discovering Diverse Nearly Optimal Policies with Successor Features

    Authors: Tom Zahavy, Brendan O'Donoghue, Andre Barreto, Volodymyr Mnih, Sebastian Flennerhag, Satinder Singh

    Abstract: Finding different solutions to the same problem is a key aspect of intelligence associated with creativity and adaptation to novel situations. In reinforcement learning, a set of diverse policies can be useful for exploration, transfer, hierarchy, and robustness. We propose Diverse Successive Policies, a method for discovering policies that are diverse in the space of Successor Features, while ass… ▽ More

    Submitted 4 January, 2022; v1 submitted 1 June, 2021; originally announced June 2021.

  4. arXiv:2106.00661  [pdf, other

    cs.AI cs.LG stat.ML

    Reward is enough for convex MDPs

    Authors: Tom Zahavy, Brendan O'Donoghue, Guillaume Desjardins, Satinder Singh

    Abstract: Maximising a cumulative reward function that is Markov and stationary, i.e., defined over state-action pairs and independent of time, is sufficient to capture many kinds of goals in a Markov decision process (MDP). However, not all goals can be captured in this manner. In this paper we study convex MDPs in which goals are expressed as convex functions of the stationary distribution and show that t… ▽ More

    Submitted 2 June, 2023; v1 submitted 1 June, 2021; originally announced June 2021.

  5. arXiv:2006.05145  [pdf, other

    cs.LG stat.CO stat.ML

    Matrix games with bandit feedback

    Authors: Brendan O'Donoghue, Tor Lattimore, Ian Osband

    Abstract: We study a version of the classical zero-sum matrix game with unknown payoff matrix and bandit feedback, where the players only observe each others actions and a noisy payoff. This generalizes the usual matrix game, where the payoff matrix is known to the players. Despite numerous applications, this problem has received relatively little attention. Although adversarial bandit algorithms achieve lo… ▽ More

    Submitted 12 June, 2021; v1 submitted 9 June, 2020; originally announced June 2020.

  6. arXiv:1902.09592  [pdf, other

    cs.LG stat.ML

    Verification of Non-Linear Specifications for Neural Networks

    Authors: Chongli Qin, Krishnamurthy, Dvijotham, Brendan O'Donoghue, Rudy Bunel, Robert Stanforth, Sven Gowal, Jonathan Uesato, Grzegorz Swirszcz, Pushmeet Kohli

    Abstract: Prior work on neural network verification has focused on specifications that are linear functions of the output of the network, e.g., invariance of the classifier output under adversarial perturbations of the input. In this paper, we extend verification algorithms to be able to certify richer properties of neural networks. To do this we introduce the class of convex-relaxable specifications, which… ▽ More

    Submitted 25 February, 2019; originally announced February 2019.

    Comments: ICLR conference paper

  7. arXiv:1809.05042  [pdf, other

    math.OC cs.LG stat.ML

    Hamiltonian Descent Methods

    Authors: Chris J. Maddison, Daniel Paulin, Yee Whye Teh, Brendan O'Donoghue, Arnaud Doucet

    Abstract: We propose a family of optimization methods that achieve linear convergence using first-order gradient information and constant step sizes on a class of convex functions much larger than the smooth and strongly convex ones. This larger class includes functions whose second derivatives may be singular or unbounded at their minima. Our methods are discretizations of conformal Hamiltonian dynamics, w… ▽ More

    Submitted 13 September, 2018; originally announced September 2018.

  8. arXiv:1807.09647  [pdf, other

    cs.LG cs.AI stat.ML

    Variational Bayesian Reinforcement Learning with Regret Bounds

    Authors: Brendan O'Donoghue

    Abstract: In reinforcement learning the Q-values summarize the expected future rewards that the agent will attain. However, they cannot capture the epistemic uncertainty about those rewards. In this work we derive a new Bellman operator with associated fixed point we call the `knowledge values'. These K-values compress both the expected future rewards and the epistemic uncertainty into a single value, so th… ▽ More

    Submitted 6 December, 2022; v1 submitted 25 July, 2018; originally announced July 2018.

  9. arXiv:1805.10265  [pdf, other

    cs.LG stat.ML

    Training verified learners with learned verifiers

    Authors: Krishnamurthy Dvijotham, Sven Gowal, Robert Stanforth, Relja Arandjelovic, Brendan O'Donoghue, Jonathan Uesato, Pushmeet Kohli

    Abstract: This paper proposes a new algorithmic framework, predictor-verifier training, to train neural networks that are verifiable, i.e., networks that provably satisfy some desired input-output properties. The key idea is to simultaneously train two networks: a predictor network that performs the task at hand,e.g., predicting labels given inputs, and a verifier network that computes a bound on how well t… ▽ More

    Submitted 29 May, 2018; v1 submitted 25 May, 2018; originally announced May 2018.

  10. arXiv:1802.05666  [pdf, other

    cs.LG cs.CR stat.ML

    Adversarial Risk and the Dangers of Evaluating Against Weak Attacks

    Authors: Jonathan Uesato, Brendan O'Donoghue, Aaron van den Oord, Pushmeet Kohli

    Abstract: This paper investigates recently proposed approaches for defending against adversarial examples and evaluating adversarial robustness. We motivate 'adversarial risk' as an objective for achieving models robust to worst-case inputs. We then frame commonly used attacks and evaluation metrics as defining a tractable surrogate objective to the true adversarial risk. This suggests that models may optim… ▽ More

    Submitted 12 June, 2018; v1 submitted 15 February, 2018; originally announced February 2018.

  11. arXiv:1709.05380  [pdf, other

    cs.AI cs.LG math.OC stat.ML

    The Uncertainty Bellman Equation and Exploration

    Authors: Brendan O'Donoghue, Ian Osband, Remi Munos, Volodymyr Mnih

    Abstract: We consider the exploration/exploitation problem in reinforcement learning. For exploitation, it is well known that the Bellman equation connects the value at any time-step to the expected value at subsequent time-steps. In this paper we consider a similar \textit{uncertainty} Bellman equation (UBE), which connects the uncertainty at any time-step to the expected uncertainties at subsequent time-s… ▽ More

    Submitted 22 October, 2018; v1 submitted 15 September, 2017; originally announced September 2017.

  12. arXiv:1611.01626  [pdf, other

    cs.LG cs.AI math.OC stat.ML

    Combining policy gradient and Q-learning

    Authors: Brendan O'Donoghue, Remi Munos, Koray Kavukcuoglu, Volodymyr Mnih

    Abstract: Policy gradient is an efficient technique for improving a policy in a reinforcement learning setting. However, vanilla online variants are on-policy only and not able to take advantage of off-policy data. In this paper we describe a new technique that combines policy gradient with off-policy Q-learning, drawing experience from a replay buffer. This is motivated by making a connection between the f… ▽ More

    Submitted 7 April, 2017; v1 submitted 5 November, 2016; originally announced November 2016.