Skip to main content

Showing 1–3 of 3 results for author: Goldie, A D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.20659  [pdf, ps, other

    cs.LG

    An Optimisation Framework for Unsupervised Environment Design

    Authors: Nathan Monette, Alistair Letcher, Michael Beukman, Matthew T. Jackson, Alexander Rutherford, Alexander D. Goldie, Jakob N. Foerster

    Abstract: For reinforcement learning agents to be deployed in high-risk settings, they must achieve a high level of robustness to unfamiliar scenarios. One method for improving robustness is unsupervised environment design (UED), a suite of methods aiming to maximise an agent's generalisability across configurations of an environment. In this work, we study UED from an optimisation perspective, providing st… ▽ More

    Submitted 9 July, 2025; v1 submitted 26 May, 2025; originally announced May 2025.

    Comments: Reinforcement Learning Conference 2025

  2. arXiv:2412.17113  [pdf, other

    cs.LG math.OC

    Adam on Local Time: Addressing Nonstationarity in RL with Relative Adam Timesteps

    Authors: Benjamin Ellis, Matthew T. Jackson, Andrei Lupu, Alexander D. Goldie, Mattie Fellows, Shimon Whiteson, Jakob Foerster

    Abstract: In reinforcement learning (RL), it is common to apply techniques used broadly in machine learning such as neural network function approximators and momentum-based optimizers. However, such tools were largely developed for supervised learning rather than nonstationary RL, leading practitioners to adopt target networks, clipped policy updates, and other RL-specific implementation tricks to combat th… ▽ More

    Submitted 22 December, 2024; originally announced December 2024.

  3. arXiv:2407.07082  [pdf, other

    cs.LG cs.AI

    Can Learned Optimization Make Reinforcement Learning Less Difficult?

    Authors: Alexander David Goldie, Chris Lu, Matthew Thomas Jackson, Shimon Whiteson, Jakob Nicolaus Foerster

    Abstract: While reinforcement learning (RL) holds great potential for decision making in the real world, it suffers from a number of unique difficulties which often need specific consideration. In particular: it is highly non-stationary; suffers from high degrees of plasticity loss; and requires exploration to prevent premature convergence to local optima and maximize return. In this paper, we consider whet… ▽ More

    Submitted 15 April, 2025; v1 submitted 9 July, 2024; originally announced July 2024.

    Comments: Added Metadata for Neurips 2024

    Journal ref: Advances in Neural Information Processing Systems 37 (2024) 5454-5497