Skip to main content

Showing 1–13 of 13 results for author: Borsa, D

Searching in archive stat. Search in all archives.
.
  1. arXiv:2206.08736  [pdf, other

    stat.ML cs.LG

    Generalised Policy Improvement with Geometric Policy Composition

    Authors: Shantanu Thakoor, Mark Rowland, Diana Borsa, Will Dabney, Rémi Munos, André Barreto

    Abstract: We introduce a method for policy improvement that interpolates between the greedy approach of value-based reinforcement learning (RL) and the full planning approach typical of model-based RL. The new method builds on the concept of a geometric horizon model (GHM, also known as a gamma-model), which models the discounted state-visitation distribution of a given policy. We show that we can evaluate… ▽ More

    Submitted 17 June, 2022; originally announced June 2022.

    Comments: ICML 2022

  2. arXiv:2202.09699  [pdf, other

    cs.LG cs.AI stat.ML

    Selective Credit Assignment

    Authors: Veronica Chelu, Diana Borsa, Doina Precup, Hado van Hasselt

    Abstract: Efficient credit assignment is essential for reinforcement learning algorithms in both prediction and control settings. We describe a unified view on temporal-difference algorithms for selective credit assignment. These selective algorithms apply weightings to quantify the contribution of learning updates. We present insights into applying weightings to value-based learning and planning algorithms… ▽ More

    Submitted 19 February, 2022; originally announced February 2022.

  3. arXiv:2105.05347  [pdf, other

    cs.LG cs.AI stat.ML

    Return-based Scaling: Yet Another Normalisation Trick for Deep RL

    Authors: Tom Schaul, Georg Ostrovski, Iurii Kemaev, Diana Borsa

    Abstract: Scaling issues are mundane yet irritating for practitioners of reinforcement learning. Error scales vary across domains, tasks, and stages of learning; sometimes by many orders of magnitude. This can be detrimental to learning speed and stability, create interference between learning tasks, and necessitate substantial tuning. We revisit this topic for agents based on temporal-difference learning,… ▽ More

    Submitted 11 May, 2021; originally announced May 2021.

  4. arXiv:2010.02255  [pdf, other

    cs.AI cs.LG stat.ML

    Temporal Difference Uncertainties as a Signal for Exploration

    Authors: Sebastian Flennerhag, Jane X. Wang, Pablo Sprechmann, Francesco Visin, Alexandre Galashov, Steven Kapturowski, Diana L. Borsa, Nicolas Heess, Andre Barreto, Razvan Pascanu

    Abstract: An effective approach to exploration in reinforcement learning is to rely on an agent's uncertainty over the optimal policy, which can yield near-optimal exploration strategies in tabular settings. However, in non-tabular settings that involve function approximators, obtaining accurate uncertainty estimates is almost as challenging a problem. In this paper, we highlight that value estimates are ea… ▽ More

    Submitted 1 July, 2021; v1 submitted 5 October, 2020; originally announced October 2020.

    Comments: 9 pages, 11 figures, 5 tables

  5. arXiv:2007.01839  [pdf, other

    cs.LG cs.AI stat.ML

    Expected Eligibility Traces

    Authors: Hado van Hasselt, Sephora Madjiheurem, Matteo Hessel, David Silver, André Barreto, Diana Borsa

    Abstract: The question of how to determine which states and actions are responsible for a certain outcome is known as the credit assignment problem and remains a central research question in reinforcement learning and artificial intelligence. Eligibility traces enable efficient credit assignment to the recent sequence of states and actions experienced by the agent, but not to counterfactual sequences that c… ▽ More

    Submitted 8 February, 2021; v1 submitted 3 July, 2020; originally announced July 2020.

    Comments: AAAI, distinguished paper award

  6. arXiv:1912.06910  [pdf, other

    cs.LG cs.AI stat.ML

    Adapting Behaviour for Learning Progress

    Authors: Tom Schaul, Diana Borsa, David Ding, David Szepesvari, Georg Ostrovski, Will Dabney, Simon Osindero

    Abstract: Determining what experience to generate to best facilitate learning (i.e. exploration) is one of the distinguishing features and open challenges in reinforcement learning. The advent of distributed agents that interact with parallel instances of the environment has enabled larger scales and greater flexibility, but has not removed the need to tune exploration to the task, because the ideal data fo… ▽ More

    Submitted 14 December, 2019; originally announced December 2019.

  7. arXiv:1910.07479  [pdf, other

    cs.LG stat.ML

    Conditional Importance Sampling for Off-Policy Learning

    Authors: Mark Rowland, Anna Harutyunyan, Hado van Hasselt, Diana Borsa, Tom Schaul, Rémi Munos, Will Dabney

    Abstract: The principal contribution of this paper is a conceptual framework for off-policy reinforcement learning, based on conditional expectations of importance sampling ratios. This framework yields new perspectives and understanding of existing off-policy algorithms, and reveals a broad space of unexplored algorithms. We theoretically analyse this space, and concretely investigate several algorithms th… ▽ More

    Submitted 30 July, 2020; v1 submitted 16 October, 2019; originally announced October 2019.

    Comments: AISTATS 2020 camera-ready version

  8. arXiv:1907.03687  [pdf, other

    cs.LG cs.AI stat.ML

    General non-linear Bellman equations

    Authors: Hado van Hasselt, John Quan, Matteo Hessel, Zhongwen Xu, Diana Borsa, Andre Barreto

    Abstract: We consider a general class of non-linear Bellman equations. These open up a design space of algorithms that have interesting properties, which has two potential advantages. First, we can perhaps better model natural phenomena. For instance, hyperbolic discounting has been proposed as a mathematical model that matches human and animal data well, and can therefore be used to explain preference orde… ▽ More

    Submitted 8 July, 2019; originally announced July 2019.

  9. arXiv:1904.11455  [pdf, other

    cs.LG cs.AI stat.ML

    Ray Interference: a Source of Plateaus in Deep Reinforcement Learning

    Authors: Tom Schaul, Diana Borsa, Joseph Modayil, Razvan Pascanu

    Abstract: Rather than proposing a new method, this paper investigates an issue present in existing learning algorithms. We study the learning dynamics of reinforcement learning (RL), specifically a characteristic coupling between learning and data generation that arises because RL agents control their future data distribution. In the presence of function approximation, this coupling can lead to a problemati… ▽ More

    Submitted 25 April, 2019; originally announced April 2019.

    Comments: Full version of RLDM abstract

  10. arXiv:1902.09996  [pdf, other

    cs.AI cs.LG stat.ML

    The Termination Critic

    Authors: Anna Harutyunyan, Will Dabney, Diana Borsa, Nicolas Heess, Remi Munos, Doina Precup

    Abstract: In this work, we consider the problem of autonomously discovering behavioral abstractions, or options, for reinforcement learning agents. We propose an algorithm that focuses on the termination condition, as opposed to -- as is common -- the policy. The termination condition is usually trained to optimize a control objective: an option ought to terminate if another has better value. We offer a dif… ▽ More

    Submitted 26 February, 2019; originally announced February 2019.

    Comments: AISTATS 2019

  11. arXiv:1812.07626  [pdf, other

    cs.LG cs.AI stat.ML

    Universal Successor Features Approximators

    Authors: Diana Borsa, André Barreto, John Quan, Daniel Mankowitz, Rémi Munos, Hado van Hasselt, David Silver, Tom Schaul

    Abstract: The ability of a reinforcement learning (RL) agent to learn about many reward functions at the same time has many potential benefits, such as the decomposition of complex tasks into simpler ones, the exchange of information between tasks, and the reuse of skills. We focus on one aspect in particular, namely the ability to generalise to unseen tasks. Parametric generalisation relies on the interpol… ▽ More

    Submitted 18 December, 2018; originally announced December 2018.

  12. arXiv:1706.06617  [pdf, other

    cs.LG cs.AI stat.ML

    Observational Learning by Reinforcement Learning

    Authors: Diana Borsa, Bilal Piot, Rémi Munos, Olivier Pietquin

    Abstract: Observational learning is a type of learning that occurs as a function of observing, retaining and possibly replicating or imitating the behaviour of another agent. It is a core mechanism appearing in various instances of social learning and has been found to be employed in several intelligent species, including humans. In this paper, we investigate to what extent the explicit modelling of other a… ▽ More

    Submitted 20 June, 2017; originally announced June 2017.

  13. arXiv:1506.03041  [pdf, other

    cs.AI stat.ML

    The Wreath Process: A totally generative model of geometric shape based on nested symmetries

    Authors: Diana Borsa, Thore Graepel, Andrew Gordon

    Abstract: We consider the problem of modelling noisy but highly symmetric shapes that can be viewed as hierarchies of whole-part relationships in which higher level objects are composed of transformed collections of lower level objects. To this end, we propose the stochastic wreath process, a fully generative probabilistic model of drawings. Following Leyton's "Generative Theory of Shape", we represent shap… ▽ More

    Submitted 9 June, 2015; originally announced June 2015.

    Comments: 10 pages(double-column), 60+ figures

    MSC Class: 20-XX