Skip to main content

Showing 1–13 of 13 results for author: Ball, P J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2303.06614  [pdf, other

    cs.LG cs.AI stat.ML

    Synthetic Experience Replay

    Authors: Cong Lu, Philip J. Ball, Yee Whye Teh, Jack Parker-Holder

    Abstract: A key theme in the past decade has been that when large neural networks and large datasets combine they can produce remarkable results. In deep reinforcement learning (RL), this paradigm is commonly made possible through experience replay, whereby a dataset of past experiences is used to train a policy or value function. However, unlike in supervised or self-supervised learning, an RL agent has to… ▽ More

    Submitted 26 October, 2023; v1 submitted 12 March, 2023; originally announced March 2023.

    Comments: Published at NeurIPS, 2023

  2. arXiv:2302.02948  [pdf, other

    cs.LG cs.AI

    Efficient Online Reinforcement Learning with Offline Data

    Authors: Philip J. Ball, Laura Smith, Ilya Kostrikov, Sergey Levine

    Abstract: Sample efficiency and exploration remain major challenges in online reinforcement learning (RL). A powerful approach that can be applied to address these issues is the inclusion of offline data, such as prior trajectories from a human expert or a sub-optimal exploration policy. Previous methods have relied on extensive modifications and additional complexity to ensure the effective use of this dat… ▽ More

    Submitted 31 May, 2023; v1 submitted 6 February, 2023; originally announced February 2023.

    Comments: Short Presentation at ICML 2023; to reproduce our results and use our codebase, see https://github.com/ikostrikov/rlpd

  3. arXiv:2210.12719  [pdf, other

    cs.LG cs.AI

    Learning General World Models in a Handful of Reward-Free Deployments

    Authors: Yingchen Xu, Jack Parker-Holder, Aldo Pacchiano, Philip J. Ball, Oleh Rybkin, Stephen J. Roberts, Tim Rocktäschel, Edward Grefenstette

    Abstract: Building generally capable agents is a grand challenge for deep reinforcement learning (RL). To approach this challenge practically, we outline two key desiderata: 1) to facilitate generalization, exploration should be task agnostic; 2) to facilitate scalability, exploration policies should collect large quantities of data without costly centralized retraining. Combining these two properties, we i… ▽ More

    Submitted 23 October, 2022; originally announced October 2022.

    Comments: To be published at NeurIPS 2022. Code and videos available at https://ycxuyingchen.github.io/cascade/

  4. arXiv:2207.09405  [pdf, other

    cs.LG cs.AI

    Bayesian Generational Population-Based Training

    Authors: Xingchen Wan, Cong Lu, Jack Parker-Holder, Philip J. Ball, Vu Nguyen, Binxin Ru, Michael A. Osborne

    Abstract: Reinforcement learning (RL) offers the potential for training generally capable agents that can interact autonomously in the real world. However, one key limitation is the brittleness of RL algorithms to core hyperparameters and network architecture choice. Furthermore, non-stationarities such as evolving training data and increased agent complexity mean that different hyperparameters and architec… ▽ More

    Submitted 19 July, 2022; originally announced July 2022.

    Comments: AutoML Conference 2022. 10 pages, 4 figure, 3 tables (28 pages, 10 figures, 7 tables including references and appendices)

  5. arXiv:2207.00986  [pdf, other

    cs.LG cs.AI cs.CV

    Stabilizing Off-Policy Deep Reinforcement Learning from Pixels

    Authors: Edoardo Cetin, Philip J. Ball, Steve Roberts, Oya Celiktutan

    Abstract: Off-policy reinforcement learning (RL) from pixel observations is notoriously unstable. As a result, many successful algorithms must combine different domain-specific practices and auxiliary losses to learn meaningful behaviors in complex environments. In this work, we provide novel analysis demonstrating that these instabilities arise from performing temporal-difference learning with a convolutio… ▽ More

    Submitted 3 July, 2022; originally announced July 2022.

    Comments: Short presentation at ICML 2022

  6. arXiv:2206.04779  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Challenges and Opportunities in Offline Reinforcement Learning from Visual Observations

    Authors: Cong Lu, Philip J. Ball, Tim G. J. Rudner, Jack Parker-Holder, Michael A. Osborne, Yee Whye Teh

    Abstract: Offline reinforcement learning has shown great promise in leveraging large pre-collected datasets for policy learning, allowing agents to forgo often-expensive online data collection. However, offline reinforcement learning from visual observations with continuous action spaces remains under-explored, with a limited understanding of the key challenges in this complex domain. In this paper, we esta… ▽ More

    Submitted 6 July, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

    Comments: Published at TMLR, 2023

  7. arXiv:2110.04135  [pdf, other

    cs.LG cs.AI

    Revisiting Design Choices in Offline Model-Based Reinforcement Learning

    Authors: Cong Lu, Philip J. Ball, Jack Parker-Holder, Michael A. Osborne, Stephen J. Roberts

    Abstract: Offline reinforcement learning enables agents to leverage large pre-collected datasets of environment transitions to learn control policies, circumventing the need for potentially expensive or unsafe online data collection. Significant progress has been made recently in offline model-based reinforcement learning, approaches which leverage a learned dynamics model. This typically involves construct… ▽ More

    Submitted 16 March, 2022; v1 submitted 8 October, 2021; originally announced October 2021.

    Comments: Spotlight @ ICLR 2022; Spotlight @ RL4RealLife Workshop ICML2021

  8. arXiv:2104.05632  [pdf, other

    cs.LG cs.AI

    Augmented World Models Facilitate Zero-Shot Dynamics Generalization From a Single Offline Environment

    Authors: Philip J. Ball, Cong Lu, Jack Parker-Holder, Stephen Roberts

    Abstract: Reinforcement learning from large-scale offline datasets provides us with the ability to learn policies without potentially unsafe or impractical exploration. Significant progress has been made in the past few years in dealing with the challenge of correcting for differing behavior between the data collection and learned policies. However, little attention has been paid to potentially changing dyn… ▽ More

    Submitted 3 August, 2021; v1 submitted 12 April, 2021; originally announced April 2021.

    Comments: Accepted @ ICML 2021; Spotlight @ ICLR 2021 "Self-Supervision for Reinforcement Learning Workshop"

  9. arXiv:2101.11331  [pdf, other

    cs.LG math.OC

    OffCon$^3$: What is state of the art anyway?

    Authors: Philip J. Ball, Stephen J. Roberts

    Abstract: Two popular approaches to model-free continuous control tasks are SAC and TD3. At first glance these approaches seem rather different; SAC aims to solve the entropy-augmented MDP by minimising the KL-divergence between a stochastic proposal policy and a hypotheical energy-basd soft Q-function policy, whereas TD3 is derived from DPG, which uses a deterministic policy to perform policy gradient asce… ▽ More

    Submitted 14 March, 2021; v1 submitted 27 January, 2021; originally announced January 2021.

  10. arXiv:2010.15187  [pdf, other

    cs.LG cs.AI

    A Study on Efficiency in Continual Learning Inspired by Human Learning

    Authors: Philip J. Ball, Yingzhen Li, Angus Lamb, Cheng Zhang

    Abstract: Humans are efficient continual learning systems; we continually learn new skills from birth with finite cells and resources. Our learning is highly optimized both in terms of capacity and time while not suffering from catastrophic forgetting. In this work we study the efficiency of continual learning systems, taking inspiration from human learning. In particular, inspired by the mechanisms of slee… ▽ More

    Submitted 28 October, 2020; originally announced October 2020.

    Comments: Accepted at NeurIPS 2020 BabyMind Workshop

  11. arXiv:2006.11911  [pdf, other

    cs.LG stat.ML

    Towards Tractable Optimism in Model-Based Reinforcement Learning

    Authors: Aldo Pacchiano, Philip J. Ball, Jack Parker-Holder, Krzysztof Choromanski, Stephen Roberts

    Abstract: The principle of optimism in the face of uncertainty is prevalent throughout sequential decision making problems such as multi-armed bandits and reinforcement learning (RL). To be successful, an optimistic RL algorithm must over-estimate the true value function (optimism) but not by so much that it is inaccurate (estimation error). In the tabular setting, many state-of-the-art methods produce the… ▽ More

    Submitted 3 December, 2021; v1 submitted 21 June, 2020; originally announced June 2020.

    Comments: Presented as a conference paper at UAI 2021

  12. arXiv:1909.10863  [pdf, other

    cs.AI q-bio.QM

    Active inference: demystified and compared

    Authors: Noor Sajid, Philip J. Ball, Thomas Parr, Karl J. Friston

    Abstract: Active inference is a first principle account of how autonomous agents operate in dynamic, non-stationary environments. This problem is also considered in reinforcement learning (RL), but limited work exists on comparing the two approaches on the same discrete-state environments. In this paper, we provide: 1) an accessible overview of the discrete-state formulation of active inference, highlightin… ▽ More

    Submitted 30 October, 2020; v1 submitted 24 September, 2019; originally announced September 2019.

    Journal ref: Neural Computation 2021

  13. arXiv:1907.01040  [pdf, other

    cs.LG cs.CY stat.ML

    The Sensitivity of Counterfactual Fairness to Unmeasured Confounding

    Authors: Niki Kilbertus, Philip J. Ball, Matt J. Kusner, Adrian Weller, Ricardo Silva

    Abstract: Causal approaches to fairness have seen substantial recent interest, both from the machine learning community and from wider parties interested in ethical prediction algorithms. In no small part, this has been due to the fact that causal models allow one to simultaneously leverage data and expert knowledge to remove discriminatory effects from predictions. However, one of the primary assumptions i… ▽ More

    Submitted 1 July, 2019; originally announced July 2019.

    Comments: published at UAI 2019