Skip to main content

Showing 1–11 of 11 results for author: Klissarov, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.14045  [pdf, ps, other

    cs.AI

    Discovering Temporal Structure: An Overview of Hierarchical Reinforcement Learning

    Authors: Martin Klissarov, Akhil Bagaria, Ziyan Luo, George Konidaris, Doina Precup, Marlos C. Machado

    Abstract: Developing agents capable of exploring, planning and learning in complex open-ended environments is a grand challenge in artificial intelligence (AI). Hierarchical reinforcement learning (HRL) offers a promising solution to this challenge by discovering and exploiting the temporal structure within a stream of experience. The strong appeal of the HRL framework has led to a rich and diverse body of… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

  2. arXiv:2412.08542  [pdf, other

    cs.AI cs.CL cs.LG

    MaestroMotif: Skill Design from Artificial Intelligence Feedback

    Authors: Martin Klissarov, Mikael Henaff, Roberta Raileanu, Shagun Sodhani, Pascal Vincent, Amy Zhang, Pierre-Luc Bacon, Doina Precup, Marlos C. Machado, Pierluca D'Oro

    Abstract: Describing skills in natural language has the potential to provide an accessible way to inject human knowledge about decision-making into an AI system. We present MaestroMotif, a method for AI-assisted skill design, which yields high-performing and adaptable agents. MaestroMotif leverages the capabilities of Large Language Models (LLMs) to effectively create and reuse skills. It first uses an LLM'… ▽ More

    Submitted 11 December, 2024; originally announced December 2024.

  3. arXiv:2410.05656  [pdf, other

    cs.AI

    On the Modeling Capabilities of Large Language Models for Sequential Decision Making

    Authors: Martin Klissarov, Devon Hjelm, Alexander Toshev, Bogdan Mazoure

    Abstract: Large pretrained models are showing increasingly better performance in reasoning and planning tasks across different modalities, opening the possibility to leverage them for complex sequential decision making problems. In this paper, we investigate the capabilities of Large Language Models (LLMs) for reinforcement learning (RL) across a diversity of interactive domains. We evaluate their ability t… ▽ More

    Submitted 7 October, 2024; originally announced October 2024.

  4. arXiv:2402.04764  [pdf, other

    cs.LG

    Code as Reward: Empowering Reinforcement Learning with VLMs

    Authors: David Venuto, Sami Nur Islam, Martin Klissarov, Doina Precup, Sherry Yang, Ankit Anand

    Abstract: Pre-trained Vision-Language Models (VLMs) are able to understand visual concepts, describe and decompose complex tasks into sub-tasks, and provide feedback on task completion. In this paper, we aim to leverage these capabilities to support the training of reinforcement learning (RL) agents. In principle, VLMs are well suited for this purpose, as they can naturally analyze image-based observations… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

  5. arXiv:2310.00166  [pdf, other

    cs.AI cs.LG

    Motif: Intrinsic Motivation from Artificial Intelligence Feedback

    Authors: Martin Klissarov, Pierluca D'Oro, Shagun Sodhani, Roberta Raileanu, Pierre-Luc Bacon, Pascal Vincent, Amy Zhang, Mikael Henaff

    Abstract: Exploring rich environments and evaluating one's actions without prior knowledge is immensely challenging. In this paper, we propose Motif, a general method to interface such prior knowledge from a Large Language Model (LLM) with an agent. Motif is based on the idea of grounding LLMs for decision-making without requiring them to interact with the environment: it elicits preferences from an LLM ove… ▽ More

    Submitted 29 September, 2023; originally announced October 2023.

    Comments: The first two authors equally contributed - order decided by coin flip

  6. arXiv:2301.11181  [pdf, other

    cs.LG cs.AI

    Deep Laplacian-based Options for Temporally-Extended Exploration

    Authors: Martin Klissarov, Marlos C. Machado

    Abstract: Selecting exploratory actions that generate a rich stream of experience for better learning is a fundamental challenge in reinforcement learning (RL). An approach to tackle this problem consists in selecting actions according to specific policies for an extended period of time, also known as options. A recent line of work to derive such exploratory options builds upon the eigenfunctions of the gra… ▽ More

    Submitted 9 June, 2023; v1 submitted 26 January, 2023; originally announced January 2023.

  7. arXiv:2112.03097  [pdf, other

    cs.LG cs.AI

    Flexible Option Learning

    Authors: Martin Klissarov, Doina Precup

    Abstract: Temporal abstraction in reinforcement learning (RL), offers the promise of improving generalization and knowledge transfer in complex environments, by propagating information more efficiently over time. Although option learning was initially formulated in a way that allows updating many options simultaneously, using off-policy, intra-option learning (Sutton, Precup & Singh, 1999), many of the rece… ▽ More

    Submitted 6 December, 2021; originally announced December 2021.

    Comments: NeurIPS 2021 Spotlight

  8. arXiv:2010.02474  [pdf, other

    cs.LG cs.AI

    Reward Propagation Using Graph Convolutional Networks

    Authors: Martin Klissarov, Doina Precup

    Abstract: Potential-based reward shaping provides an approach for designing good reward functions, with the purpose of speeding up learning. However, automatically finding potential functions for complex environments is a difficult problem (in fact, of the same difficulty as learning a value function from scratch). We propose a new framework for learning potential functions by leveraging ideas from graph re… ▽ More

    Submitted 1 November, 2020; v1 submitted 6 October, 2020; originally announced October 2020.

  9. arXiv:2001.00271  [pdf, other

    cs.LG cs.AI stat.ML

    Options of Interest: Temporal Abstraction with Interest Functions

    Authors: Khimya Khetarpal, Martin Klissarov, Maxime Chevalier-Boisvert, Pierre-Luc Bacon, Doina Precup

    Abstract: Temporal abstraction refers to the ability of an agent to use behaviours of controllers which act for a limited, variable amount of time. The options framework describes such behaviours as consisting of a subset of states in which they can initiate, an internal policy and a stochastic termination condition. However, much of the subsequent work on option discovery has ignored the initiation set, be… ▽ More

    Submitted 1 January, 2020; originally announced January 2020.

    Comments: To appear in Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20)

  10. arXiv:1712.00004  [pdf, other

    cs.LG cs.AI

    Learnings Options End-to-End for Continuous Action Tasks

    Authors: Martin Klissarov, Pierre-Luc Bacon, Jean Harb, Doina Precup

    Abstract: We present new results on learning temporally extended actions for continuoustasks, using the options framework (Suttonet al.[1999b], Precup [2000]). In orderto achieve this goal we work with the option-critic architecture (Baconet al.[2017])using a deliberation cost and train it with proximal policy optimization (Schulmanet al.[2017]) instead of vanilla policy gradient. Results on Mujoco domains… ▽ More

    Submitted 29 November, 2017; originally announced December 2017.

  11. arXiv:1709.04571  [pdf, other

    cs.AI

    When Waiting is not an Option : Learning Options with a Deliberation Cost

    Authors: Jean Harb, Pierre-Luc Bacon, Martin Klissarov, Doina Precup

    Abstract: Recent work has shown that temporally extended actions (options) can be learned fully end-to-end as opposed to being specified in advance. While the problem of "how" to learn options is increasingly well understood, the question of "what" good options should be has remained elusive. We formulate our answer to what "good" options should be in the bounded rationality framework (Simon, 1957) through… ▽ More

    Submitted 13 September, 2017; originally announced September 2017.