Code as Reward: Empowering Reinforcement Learning with VLMs

Venuto, David; Islam, Sami Nur; Klissarov, Martin; Precup, Doina; Yang, Sherry; Anand, Ankit

Computer Science > Machine Learning

arXiv:2402.04764 (cs)

[Submitted on 7 Feb 2024]

Title:Code as Reward: Empowering Reinforcement Learning with VLMs

Authors:David Venuto, Sami Nur Islam, Martin Klissarov, Doina Precup, Sherry Yang, Ankit Anand

View PDF

Abstract:Pre-trained Vision-Language Models (VLMs) are able to understand visual concepts, describe and decompose complex tasks into sub-tasks, and provide feedback on task completion. In this paper, we aim to leverage these capabilities to support the training of reinforcement learning (RL) agents. In principle, VLMs are well suited for this purpose, as they can naturally analyze image-based observations and provide feedback (reward) on learning progress. However, inference in VLMs is computationally expensive, so querying them frequently to compute rewards would significantly slowdown the training of an RL agent. To address this challenge, we propose a framework named Code as Reward (VLM-CaR). VLM-CaR produces dense reward functions from VLMs through code generation, thereby significantly reducing the computational burden of querying the VLM directly. We show that the dense rewards generated through our approach are very accurate across a diverse set of discrete and continuous environments, and can be more effective in training RL policies than the original sparse environment rewards.

Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2402.04764 [cs.LG]
	(or arXiv:2402.04764v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2402.04764

Submission history

From: David Venuto [view email]
[v1] Wed, 7 Feb 2024 11:27:45 UTC (2,489 KB)

Computer Science > Machine Learning

Title:Code as Reward: Empowering Reinforcement Learning with VLMs

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Code as Reward: Empowering Reinforcement Learning with VLMs

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators