Search | arXiv e-print repository

Vision-Language Models as a Source of Rewards

Authors: Kate Baumli, Satinder Baveja, Feryal Behbahani, Harris Chan, Gheorghe Comanici, Sebastian Flennerhag, Maxime Gazeau, Kristian Holsheimer, Dan Horgan, Michael Laskin, Clare Lyle, Hussain Masoom, Kay McKinney, Volodymyr Mnih, Alexander Neitz, Dmitry Nikulin, Fabio Pardo, Jack Parker-Holder, John Quan, Tim Rocktäschel, Himanshu Sahni, Tom Schaul, Yannick Schroecker, Stephen Spencer, Richie Steigerwald , et al. (2 additional authors not shown)

Abstract: Building generalist agents that can accomplish many goals in rich open-ended environments is one of the research frontiers for reinforcement learning. A key limiting factor for building generalist agents with RL has been the need for a large number of reward functions for achieving different goals. We investigate the feasibility of using off-the-shelf vision-language models, or VLMs, as sources of… ▽ More Building generalist agents that can accomplish many goals in rich open-ended environments is one of the research frontiers for reinforcement learning. A key limiting factor for building generalist agents with RL has been the need for a large number of reward functions for achieving different goals. We investigate the feasibility of using off-the-shelf vision-language models, or VLMs, as sources of rewards for reinforcement learning agents. We show how rewards for visual achievement of a variety of language goals can be derived from the CLIP family of models, and used to train RL agents that can achieve a variety of language goals. We showcase this approach in two distinct visual domains and present a scaling trend showing how larger VLMs lead to more accurate rewards for visual goal achievement, which in turn produces more capable RL agents. △ Less

Submitted 12 July, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

Comments: 10 pages, 5 figures

arXiv:2210.14215 [pdf, other]

In-context Reinforcement Learning with Algorithm Distillation

Authors: Michael Laskin, Luyu Wang, Junhyuk Oh, Emilio Parisotto, Stephen Spencer, Richie Steigerwald, DJ Strouse, Steven Hansen, Angelos Filos, Ethan Brooks, Maxime Gazeau, Himanshu Sahni, Satinder Singh, Volodymyr Mnih

Abstract: We propose Algorithm Distillation (AD), a method for distilling reinforcement learning (RL) algorithms into neural networks by modeling their training histories with a causal sequence model. Algorithm Distillation treats learning to reinforcement learn as an across-episode sequential prediction problem. A dataset of learning histories is generated by a source RL algorithm, and then a causal transf… ▽ More We propose Algorithm Distillation (AD), a method for distilling reinforcement learning (RL) algorithms into neural networks by modeling their training histories with a causal sequence model. Algorithm Distillation treats learning to reinforcement learn as an across-episode sequential prediction problem. A dataset of learning histories is generated by a source RL algorithm, and then a causal transformer is trained by autoregressively predicting actions given their preceding learning histories as context. Unlike sequential policy prediction architectures that distill post-learning or expert sequences, AD is able to improve its policy entirely in-context without updating its network parameters. We demonstrate that AD can reinforcement learn in-context in a variety of environments with sparse rewards, combinatorial task structure, and pixel-based observations, and find that AD learns a more data-efficient RL algorithm than the one that generated the source data. △ Less

Submitted 25 October, 2022; originally announced October 2022.

arXiv:2206.04798 [pdf, other]

A*Net: A Scalable Path-based Reasoning Approach for Knowledge Graphs

Authors: Zhaocheng Zhu, Xinyu Yuan, Mikhail Galkin, Sophie Xhonneux, Ming Zhang, Maxime Gazeau, Jian Tang

Abstract: Reasoning on large-scale knowledge graphs has been long dominated by embedding methods. While path-based methods possess the inductive capacity that embeddings lack, their scalability is limited by the exponential number of paths. Here we present A*Net, a scalable path-based method for knowledge graph reasoning. Inspired by the A* algorithm for shortest path problems, our A*Net learns a priority f… ▽ More Reasoning on large-scale knowledge graphs has been long dominated by embedding methods. While path-based methods possess the inductive capacity that embeddings lack, their scalability is limited by the exponential number of paths. Here we present A*Net, a scalable path-based method for knowledge graph reasoning. Inspired by the A* algorithm for shortest path problems, our A*Net learns a priority function to select important nodes and edges at each iteration, to reduce time and memory footprint for both training and inference. The ratio of selected nodes and edges can be specified to trade off between performance and efficiency. Experiments on both transductive and inductive knowledge graph reasoning benchmarks show that A*Net achieves competitive performance with existing state-of-the-art path-based methods, while merely visiting 10% nodes and 10% edges at each iteration. On a million-scale dataset ogbl-wikikg2, A*Net not only achieves a new state-of-the-art result, but also converges faster than embedding methods. A*Net is the first path-based method for knowledge graph reasoning at such scale. △ Less

Submitted 8 November, 2023; v1 submitted 6 June, 2022; originally announced June 2022.

Comments: NeurIPS 2023

arXiv:2102.06229 [pdf, other]

Higher Order Generalization Error for First Order Discretization of Langevin Diffusion

Authors: Mufan Bill Li, Maxime Gazeau

Abstract: We propose a novel approach to analyze generalization error for discretizations of Langevin diffusion, such as the stochastic gradient Langevin dynamics (SGLD). For an $ε$ tolerance of expected generalization error, it is known that a first order discretization can reach this target if we run $Ω(ε^{-1} \log (ε^{-1}) )$ iterations with $Ω(ε^{-1})$ samples. In this article, we show that with additio… ▽ More We propose a novel approach to analyze generalization error for discretizations of Langevin diffusion, such as the stochastic gradient Langevin dynamics (SGLD). For an $ε$ tolerance of expected generalization error, it is known that a first order discretization can reach this target if we run $Ω(ε^{-1} \log (ε^{-1}) )$ iterations with $Ω(ε^{-1})$ samples. In this article, we show that with additional smoothness assumptions, even first order methods can achieve arbitrarily runtime complexity. More precisely, for each $N>0$, we provide a sufficient smoothness condition on the loss function such that a first order discretization can reach $ε$ expected generalization error given $Ω( ε^{-1/N} \log (ε^{-1}) )$ iterations with $Ω(ε^{-1})$ samples. △ Less

Submitted 11 February, 2021; originally announced February 2021.

arXiv:1902.08234 [pdf, other]

An Empirical Study of Large-Batch Stochastic Gradient Descent with Structured Covariance Noise

Authors: Yeming Wen, Kevin Luk, Maxime Gazeau, Guodong Zhang, Harris Chan, Jimmy Ba

Abstract: The choice of batch-size in a stochastic optimization algorithm plays a substantial role for both optimization and generalization. Increasing the batch-size used typically improves optimization but degrades generalization. To address the problem of improving generalization while maintaining optimal convergence in large-batch training, we propose to add covariance noise to the gradients. We demonst… ▽ More The choice of batch-size in a stochastic optimization algorithm plays a substantial role for both optimization and generalization. Increasing the batch-size used typically improves optimization but degrades generalization. To address the problem of improving generalization while maintaining optimal convergence in large-batch training, we propose to add covariance noise to the gradients. We demonstrate that the learning performance of our method is more accurately captured by the structure of the covariance matrix of the noise rather than by the variance of gradients. Moreover, over the convex-quadratic, we prove in theory that it can be characterized by the Frobenius norm of the noise matrix. Our empirical studies with standard deep learning model-architectures and datasets shows that our method not only improves generalization performance in large-batch training, but furthermore, does so in a way where the optimization performance remains desirable and the training duration is not elongated. △ Less

Submitted 28 February, 2020; v1 submitted 21 February, 2019; originally announced February 2019.

Journal ref: The 23rd International Conference on Artificial Intelligence and Statistics, 2020

arXiv:1810.13108 [pdf, other]

A general system of differential equations to model first order adaptive algorithms

Authors: André Belotto da Silva, Maxime Gazeau

Abstract: First order optimization algorithms play a major role in large scale machine learning. A new class of methods, called adaptive algorithms, were recently introduced to adjust iteratively the learning rate for each coordinate. Despite great practical success in deep learning, their behavior and performance on more general loss functions are not well understood. In this paper, we derive a non-autonom… ▽ More First order optimization algorithms play a major role in large scale machine learning. A new class of methods, called adaptive algorithms, were recently introduced to adjust iteratively the learning rate for each coordinate. Despite great practical success in deep learning, their behavior and performance on more general loss functions are not well understood. In this paper, we derive a non-autonomous system of differential equations, which is the continuous time limit of adaptive optimization methods. We prove global well-posedness of the system and we investigate the numerical time convergence of its forward Euler approximation. We study, furthermore, the convergence of its trajectories and give conditions under which the differential system, underlying all adaptive algorithms, is suitable for optimization. We discuss convergence to a critical point in the non-convex case and give conditions for the dynamics to avoid saddle points and local maxima. For convex and deterministic loss function, we introduce a suitable Lyapunov functional which allow us to study its rate of convergence. Several other properties of both the continuous and discrete systems are briefly discussed. The differential system studied in the paper is general enough to encompass many other classical algorithms (such as Heavy ball and Nesterov's accelerated method) and allow us to recover several known results for these algorithms. △ Less

Submitted 30 September, 2019; v1 submitted 31 October, 2018; originally announced October 2018.

arXiv:1807.02150 [pdf, other]

Scalable Recommender Systems through Recursive Evidence Chains

Authors: Elias Tragas, Calvin Luo, Maxime Gazeau, Kevin Luk, David Duvenaud

Abstract: Recommender systems can be formulated as a matrix completion problem, predicting ratings from user and item parameter vectors. Optimizing these parameters by subsampling data becomes difficult as the number of users and items grows. We develop a novel approach to generate all latent variables on demand from the ratings matrix itself and a fixed pool of parameters. We estimate missing ratings using… ▽ More Recommender systems can be formulated as a matrix completion problem, predicting ratings from user and item parameter vectors. Optimizing these parameters by subsampling data becomes difficult as the number of users and items grows. We develop a novel approach to generate all latent variables on demand from the ratings matrix itself and a fixed pool of parameters. We estimate missing ratings using chains of evidence that link them to a small set of prototypical users and items. Our model automatically addresses the cold-start and online learning problems by combining information across both users and items. We investigate the scaling behavior of this model, and demonstrate competitive results with respect to current matrix factorization techniques in terms of accuracy and convergence speed. △ Less

Submitted 5 July, 2018; originally announced July 2018.

Showing 1–7 of 7 results for author: Gazeau, M