Skip to main content

Showing 1–18 of 18 results for author: Flennerhag, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2312.09187  [pdf, other

    cs.LG

    Vision-Language Models as a Source of Rewards

    Authors: Kate Baumli, Satinder Baveja, Feryal Behbahani, Harris Chan, Gheorghe Comanici, Sebastian Flennerhag, Maxime Gazeau, Kristian Holsheimer, Dan Horgan, Michael Laskin, Clare Lyle, Hussain Masoom, Kay McKinney, Volodymyr Mnih, Alexander Neitz, Dmitry Nikulin, Fabio Pardo, Jack Parker-Holder, John Quan, Tim Rocktäschel, Himanshu Sahni, Tom Schaul, Yannick Schroecker, Stephen Spencer, Richie Steigerwald , et al. (2 additional authors not shown)

    Abstract: Building generalist agents that can accomplish many goals in rich open-ended environments is one of the research frontiers for reinforcement learning. A key limiting factor for building generalist agents with RL has been the need for a large number of reward functions for achieving different goals. We investigate the feasibility of using off-the-shelf vision-language models, or VLMs, as sources of… ▽ More

    Submitted 12 July, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

    Comments: 10 pages, 5 figures

  2. arXiv:2306.10587  [pdf, other

    cs.LG cs.AI stat.ML

    Acceleration in Policy Optimization

    Authors: Veronica Chelu, Tom Zahavy, Arthur Guez, Doina Precup, Sebastian Flennerhag

    Abstract: We work towards a unifying paradigm for accelerating policy optimization methods in reinforcement learning (RL) by integrating foresight in the policy improvement step via optimistic and adaptive updates. Leveraging the connection between policy iteration and policy gradient methods, we view policy optimization algorithms as iteratively solving a sequence of surrogate objectives, local lower bound… ▽ More

    Submitted 5 September, 2023; v1 submitted 18 June, 2023; originally announced June 2023.

  3. arXiv:2304.03995  [pdf, other

    cs.NE cs.LG

    Discovering Attention-Based Genetic Algorithms via Meta-Black-Box Optimization

    Authors: Robert Tjarko Lange, Tom Schaul, Yutian Chen, Chris Lu, Tom Zahavy, Valentin Dalibard, Sebastian Flennerhag

    Abstract: Genetic algorithms constitute a family of black-box optimization algorithms, which take inspiration from the principles of biological evolution. While they provide a general-purpose tool for optimization, their particular instantiations can be heuristic and motivated by loose biological intuition. In this work we explore a fundamentally different approach: Given a sufficiently flexible parametriza… ▽ More

    Submitted 8 April, 2023; originally announced April 2023.

    Comments: 14 pages, 31 figures

  4. arXiv:2302.01275  [pdf, other

    cs.LG

    ReLOAD: Reinforcement Learning with Optimistic Ascent-Descent for Last-Iterate Convergence in Constrained MDPs

    Authors: Ted Moskovitz, Brendan O'Donoghue, Vivek Veeriah, Sebastian Flennerhag, Satinder Singh, Tom Zahavy

    Abstract: In recent years, Reinforcement Learning (RL) has been applied to real-world problems with increasing success. Such applications often require to put constraints on the agent's behavior. Existing algorithms for constrained RL (CRL) rely on gradient descent-ascent, but this approach comes with a caveat. While these algorithms are guaranteed to converge on average, they do not guarantee last-iterate… ▽ More

    Submitted 5 March, 2023; v1 submitted 2 February, 2023; originally announced February 2023.

  5. arXiv:2301.03236  [pdf, other

    cs.LG cs.AI math.OC

    Optimistic Meta-Gradients

    Authors: Sebastian Flennerhag, Tom Zahavy, Brendan O'Donoghue, Hado van Hasselt, András György, Satinder Singh

    Abstract: We study the connection between gradient-based meta-learning and convex op-timisation. We observe that gradient descent with momentum is a special case of meta-gradients, and building on recent results in optimisation, we prove convergence rates for meta-learning in the single task setting. While a meta-learned update rule can yield faster convergence up to constant factor, it is not sufficient fo… ▽ More

    Submitted 9 January, 2023; originally announced January 2023.

  6. arXiv:2211.11260  [pdf, other

    cs.NE cs.AI

    Discovering Evolution Strategies via Meta-Black-Box Optimization

    Authors: Robert Tjarko Lange, Tom Schaul, Yutian Chen, Tom Zahavy, Valentin Dallibard, Chris Lu, Satinder Singh, Sebastian Flennerhag

    Abstract: Optimizing functions without access to gradients is the remit of black-box methods such as evolution strategies. While highly general, their learning dynamics are often times heuristic and inflexible - exactly the limitations that meta-learning can address. Hence, we propose to discover effective update rules for evolution strategies via meta-learning. Concretely, our approach employs a search str… ▽ More

    Submitted 2 March, 2023; v1 submitted 21 November, 2022; originally announced November 2022.

    Comments: 25 pages, 21 figures

    Journal ref: 11th International Conference on Learning Representations, ICLR 2023

  7. arXiv:2210.12448  [pdf, other

    cs.LG

    Probing Transfer in Deep Reinforcement Learning without Task Engineering

    Authors: Andrei A. Rusu, Sebastian Flennerhag, Dushyant Rao, Razvan Pascanu, Raia Hadsell

    Abstract: We evaluate the use of original game curricula supported by the Atari 2600 console as a heterogeneous transfer benchmark for deep reinforcement learning agents. Game designers created curricula using combinations of several discrete modifications to the basic versions of games such as Space Invaders, Breakout and Freeway, making them progressively more challenging for human players. By formally or… ▽ More

    Submitted 22 October, 2022; originally announced October 2022.

  8. arXiv:2209.06159  [pdf, other

    cs.LG

    Meta-Gradients in Non-Stationary Environments

    Authors: Jelena Luketina, Sebastian Flennerhag, Yannick Schroecker, David Abel, Tom Zahavy, Satinder Singh

    Abstract: Meta-gradient methods (Xu et al., 2018; Zahavy et al., 2020) offer a promising solution to the problem of hyperparameter selection and adaptation in non-stationary reinforcement learning problems. However, the properties of meta-gradients in such environments have not been systematically studied. In this work, we bring new clarity to meta-gradients in non-stationary environments. Concretely, we as… ▽ More

    Submitted 13 September, 2022; originally announced September 2022.

    Comments: 16 pages, 9 figures, CoLLAs 2022

  9. arXiv:2205.13521  [pdf, other

    cs.AI cs.LG

    Discovering Policies with DOMiNO: Diversity Optimization Maintaining Near Optimality

    Authors: Tom Zahavy, Yannick Schroecker, Feryal Behbahani, Kate Baumli, Sebastian Flennerhag, Shaobo Hou, Satinder Singh

    Abstract: Finding different solutions to the same problem is a key aspect of intelligence associated with creativity and adaptation to novel situations. In reinforcement learning, a set of diverse policies can be useful for exploration, transfer, hierarchy, and robustness. We propose DOMiNO, a method for Diversity Optimization Maintaining Near Optimality. We formalize the problem as a Constrained Markov Dec… ▽ More

    Submitted 3 February, 2023; v1 submitted 26 May, 2022; originally announced May 2022.

  10. arXiv:2109.10781  [pdf, other

    cs.LG cs.AI cs.NE stat.ML

    Introducing Symmetries to Black Box Meta Reinforcement Learning

    Authors: Louis Kirsch, Sebastian Flennerhag, Hado van Hasselt, Abram Friesen, Junhyuk Oh, Yutian Chen

    Abstract: Meta reinforcement learning (RL) attempts to discover new RL algorithms automatically from environment interaction. In so-called black-box approaches, the policy and the learning algorithm are jointly represented by a single neural network. These methods are very flexible, but they tend to underperform in terms of generalisation to new, unseen environments. In this paper, we explore the role of sy… ▽ More

    Submitted 5 June, 2022; v1 submitted 22 September, 2021; originally announced September 2021.

    Comments: AAAI 2022

  11. arXiv:2109.04504  [pdf, other

    cs.LG cs.AI stat.ML

    Bootstrapped Meta-Learning

    Authors: Sebastian Flennerhag, Yannick Schroecker, Tom Zahavy, Hado van Hasselt, David Silver, Satinder Singh

    Abstract: Meta-learning empowers artificial intelligence to increase its efficiency by learning how to learn. Unlocking this potential involves overcoming a challenging meta-optimisation problem. We propose an algorithm that tackles this problem by letting the meta-learner teach itself. The algorithm first bootstraps a target from the meta-learner, then optimises the meta-learner by minimising the distance… ▽ More

    Submitted 16 March, 2022; v1 submitted 9 September, 2021; originally announced September 2021.

    Comments: Published at ICLR 2022. 37 pages, 19 figures, 9 tables

  12. arXiv:2106.00669  [pdf, other

    cs.AI cs.LG stat.ML

    Discovering Diverse Nearly Optimal Policies with Successor Features

    Authors: Tom Zahavy, Brendan O'Donoghue, Andre Barreto, Volodymyr Mnih, Sebastian Flennerhag, Satinder Singh

    Abstract: Finding different solutions to the same problem is a key aspect of intelligence associated with creativity and adaptation to novel situations. In reinforcement learning, a set of diverse policies can be useful for exploration, transfer, hierarchy, and robustness. We propose Diverse Successive Policies, a method for discovering policies that are diverse in the space of Successor Features, while ass… ▽ More

    Submitted 4 January, 2022; v1 submitted 1 June, 2021; originally announced June 2021.

  13. arXiv:2010.02255  [pdf, other

    cs.AI cs.LG stat.ML

    Temporal Difference Uncertainties as a Signal for Exploration

    Authors: Sebastian Flennerhag, Jane X. Wang, Pablo Sprechmann, Francesco Visin, Alexandre Galashov, Steven Kapturowski, Diana L. Borsa, Nicolas Heess, Andre Barreto, Razvan Pascanu

    Abstract: An effective approach to exploration in reinforcement learning is to rely on an agent's uncertainty over the optimal policy, which can yield near-optimal exploration strategies in tabular settings. However, in non-tabular settings that involve function approximators, obtaining accurate uncertainty estimates is almost as challenging a problem. In this paper, we highlight that value estimates are ea… ▽ More

    Submitted 1 July, 2021; v1 submitted 5 October, 2020; originally announced October 2020.

    Comments: 9 pages, 11 figures, 5 tables

  14. arXiv:2004.03445  [pdf, other

    cs.LG q-fin.CP q-fin.PM q-fin.TR stat.ML

    QuantNet: Transferring Learning Across Systematic Trading Strategies

    Authors: Adriano Koshiyama, Sebastian Flennerhag, Stefano B. Blumberg, Nick Firoozye, Philip Treleaven

    Abstract: Systematic financial trading strategies account for over 80% of trade volume in equities and a large chunk of the foreign exchange market. In spite of the availability of data from multiple markets, current approaches in trading rely mainly on learning trading strategies per individual market. In this paper, we take a step towards developing fully end-to-end global trading strategies that leverage… ▽ More

    Submitted 30 June, 2020; v1 submitted 7 April, 2020; originally announced April 2020.

  15. arXiv:1909.00025  [pdf, other

    cs.LG cs.NE stat.ML

    Meta-Learning with Warped Gradient Descent

    Authors: Sebastian Flennerhag, Andrei A. Rusu, Razvan Pascanu, Francesco Visin, Hujun Yin, Raia Hadsell

    Abstract: Learning an efficient update rule from data that promotes rapid learning of new tasks from the same distribution remains an open problem in meta-learning. Typically, previous works have approached this issue either by attempting to train a neural network that directly produces updates or by attempting to learn better initialisations or scaling factors for a gradient-based update rule. Both of thes… ▽ More

    Submitted 18 February, 2020; v1 submitted 30 August, 2019; originally announced September 2019.

    Comments: 28 pages, 13 figures, 3 tables. Published as a conference paper at ICLR 2020

  16. arXiv:1905.09796  [pdf, other

    cs.LG cs.AI stat.ML

    Augmenting correlation structures in spatial data using deep generative models

    Authors: Konstantin Klemmer, Adriano Koshiyama, Sebastian Flennerhag

    Abstract: State-of-the-art deep learning methods have shown a remarkable capacity to model complex data domains, but struggle with geospatial data. In this paper, we introduce SpaceGAN, a novel generative model for geospatial domains that learns neighbourhood structures through spatial conditioning. We propose to enhance spatial representation beyond mere spatial coordinates, by conditioning each data point… ▽ More

    Submitted 23 May, 2019; originally announced May 2019.

  17. arXiv:1812.01054  [pdf, other

    cs.LG cs.AI cs.NE stat.ML

    Transferring Knowledge across Learning Processes

    Authors: Sebastian Flennerhag, Pablo G. Moreno, Neil D. Lawrence, Andreas Damianou

    Abstract: In complex transfer learning scenarios new tasks might not be tightly linked to previous tasks. Approaches that transfer information contained only in the final parameters of a source model will therefore struggle. Instead, transfer learning at a higher level of abstraction is needed. We propose Leap, a framework that achieves this by transferring knowledge across learning processes. We associate… ▽ More

    Submitted 22 March, 2019; v1 submitted 3 December, 2018; originally announced December 2018.

    Comments: Published as a conference paper at ICLR 2019; 23 pages, 8 figures, 6 tables

  18. arXiv:1805.08574  [pdf, other

    cs.LG cs.NE stat.ML

    Breaking the Activation Function Bottleneck through Adaptive Parameterization

    Authors: Sebastian Flennerhag, Hujun Yin, John Keane, Mark Elliot

    Abstract: Standard neural network architectures are non-linear only by virtue of a simple element-wise activation function, making them both brittle and excessively large. In this paper, we consider methods for making the feed-forward layer more flexible while preserving its basic structure. We develop simple drop-in replacements that learn to adapt their parameterization conditional on the input, thereby i… ▽ More

    Submitted 22 November, 2018; v1 submitted 22 May, 2018; originally announced May 2018.

    Comments: Published as a conference paper at NeurIPS (NIPS) 2018