Skip to main content

Showing 1–13 of 13 results for author: Ernst, D

Searching in archive stat. Search in all archives.
.
  1. arXiv:2501.19116  [pdf, ps, other

    cs.LG stat.ML

    A Theoretical Justification for Asymmetric Actor-Critic Algorithms

    Authors: Gaspard Lambrechts, Damien Ernst, Aditya Mahajan

    Abstract: In reinforcement learning for partially observable environments, many successful algorithms have been developed within the asymmetric learning paradigm. This paradigm leverages additional state information available at training time for faster learning. Although the proposed learning objectives are usually theoretically sound, these methods still lack a precise theoretical justification for their… ▽ More

    Submitted 6 June, 2025; v1 submitted 31 January, 2025; originally announced January 2025.

    Comments: 8 pages, 31 pages total

    Journal ref: International Conference on Machine Learning, 2025

  2. arXiv:2412.06655  [pdf, other

    cs.LG stat.ML

    Off-Policy Maximum Entropy RL with Future State and Action Visitation Measures

    Authors: Adrien Bolland, Gaspard Lambrechts, Damien Ernst

    Abstract: We introduce a new maximum entropy reinforcement learning framework based on the distribution of states and actions visited by a policy. More precisely, an intrinsic reward function is added to the reward function of the Markov decision process that shall be controlled. For each state and action, this intrinsic reward is the relative entropy of the discounted distribution of states and actions (or… ▽ More

    Submitted 9 December, 2024; originally announced December 2024.

  3. arXiv:2407.08415  [pdf, other

    cs.LG stat.ML

    Parallelizing Autoregressive Generation with Variational State Space Models

    Authors: Gaspard Lambrechts, Yann Claes, Pierre Geurts, Damien Ernst

    Abstract: Attention-based models such as Transformers and recurrent models like state space models (SSMs) have emerged as successful methods for autoregressive sequence modeling. Although both enable parallel training, none enable parallel generation due to their autoregressiveness. We propose the variational SSM (VSSM), a variational autoencoder (VAE) where both the encoder and decoder are SSMs. Since samp… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: 4 pages, 11 pages total, 3 figures

    Journal ref: ICML Workshop on Next Generation of Sequence Modeling Architectures, 2024

  4. arXiv:2402.00162  [pdf, other

    cs.LG stat.ML

    Behind the Myth of Exploration in Policy Gradients

    Authors: Adrien Bolland, Gaspard Lambrechts, Damien Ernst

    Abstract: Policy-gradient algorithms are effective reinforcement learning methods for solving control problems. To compute near-optimal policies, it is essential in practice to include exploration terms in the learning objective. Although the effectiveness of these terms is usually justified by an intrinsic need to explore environments, we propose a novel analysis with the lens of numerical optimization. Tw… ▽ More

    Submitted 21 January, 2025; v1 submitted 31 January, 2024; originally announced February 2024.

  5. arXiv:2305.06851  [pdf, other

    cs.LG math.OC stat.ML

    Policy Gradient Algorithms Implicitly Optimize by Continuation

    Authors: Adrien Bolland, Gilles Louppe, Damien Ernst

    Abstract: Direct policy optimization in reinforcement learning is usually solved with policy-gradient algorithms, which optimize policy parameters via stochastic gradient ascent. This paper provides a new theoretical interpretation and justification of these algorithms. First, we formulate direct policy optimization in the optimization by continuation framework. The latter is a framework for optimizing nonc… ▽ More

    Submitted 21 October, 2023; v1 submitted 11 May, 2023; originally announced May 2023.

    Comments: In Transactions on Machine Learning Research (2023)

  6. arXiv:2208.03520  [pdf, other

    cs.LG stat.ML

    Recurrent networks, hidden states and beliefs in partially observable environments

    Authors: Gaspard Lambrechts, Adrien Bolland, Damien Ernst

    Abstract: Reinforcement learning aims to learn optimal policies from interaction with environments whose dynamics are unknown. Many methods rely on the approximation of a value function to derive near-optimal policies. In partially observable environments, these functions depend on the complete sequence of observations and past actions, called the history. In this work, we show empirically that recurrent ne… ▽ More

    Submitted 6 August, 2022; originally announced August 2022.

    Comments: 12 pages, 28 pages total, 20 figures. Transactions on Machine Learning Research (2022)

    Journal ref: Transactions on Machine Learning Research, 2022

  7. arXiv:2006.01738  [pdf, other

    cs.LG stat.ML

    Jointly Learning Environments and Control Policies with Projected Stochastic Gradient Ascent

    Authors: Adrien Bolland, Ioannis Boukas, Mathias Berger, Damien Ernst

    Abstract: We consider the joint design and control of discrete-time stochastic dynamical systems over a finite time horizon. We formulate the problem as a multi-step optimization problem under uncertainty seeking to identify a system design and a control policy that jointly maximize the expected sum of rewards collected over the time horizon considered. The transition function, the reward function and the p… ▽ More

    Submitted 6 January, 2022; v1 submitted 2 June, 2020; originally announced June 2020.

    Journal ref: Journal of Artificial Intelligence Research 73 (2022) 117-171

  8. arXiv:1812.09113  [pdf, other

    cs.LG cs.NE stat.ML

    Introducing Neuromodulation in Deep Neural Networks to Learn Adaptive Behaviours

    Authors: Nicolas Vecoven, Damien Ernst, Antoine Wehenkel, Guillaume Drion

    Abstract: Animals excel at adapting their intentions, attention, and actions to the environment, making them remarkably efficient at interacting with a rich, unpredictable and ever-changing external world, a property that intelligent machines currently lack. Such an adaptation property relies heavily on cellular neuromodulation, the biological mechanism that dynamically controls intrinsic properties of neur… ▽ More

    Submitted 6 December, 2019; v1 submitted 21 December, 2018; originally announced December 2018.

  9. arXiv:1709.07796  [pdf, other

    stat.ML cs.AI cs.LG

    On overfitting and asymptotic bias in batch reinforcement learning with partial observability

    Authors: Vincent Francois-Lavet, Guillaume Rabusseau, Joelle Pineau, Damien Ernst, Raphael Fonteneau

    Abstract: This paper provides an analysis of the tradeoff between asymptotic bias (suboptimality with unlimited data) and overfitting (additional suboptimality due to limited data) in the context of reinforcement learning with partial observability. Our theoretical analysis formally characterizes that while potentially increasing the asymptotic bias, a smaller state representation decreases the risk of over… ▽ More

    Submitted 6 February, 2019; v1 submitted 22 September, 2017; originally announced September 2017.

    Comments: Accepted at the Journal of Artificial Intelligence Research (JAIR) - 31 pages

  10. arXiv:1406.7865  [pdf, other

    stat.ML cs.CE cs.LG

    Simple connectome inference from partial correlation statistics in calcium imaging

    Authors: Antonio Sutera, Arnaud Joly, Vincent François-Lavet, Zixiao Aaron Qiu, Gilles Louppe, Damien Ernst, Pierre Geurts

    Abstract: In this work, we propose a simple yet effective solution to the problem of connectome inference in calcium imaging data. The proposed algorithm consists of two steps. First, processing the raw signals to detect neural peak activities. Second, inferring the degree of association between neurons from partial correlation statistics. This paper summarises the methodology that led us to win the Connect… ▽ More

    Submitted 18 November, 2014; v1 submitted 30 June, 2014; originally announced June 2014.

    Journal ref: JMLR: Workshop and Conference Proceedings 46:23-35, 2015

  11. arXiv:1207.5259  [pdf, ps, other

    cs.LG stat.ML

    Optimal discovery with probabilistic expert advice: finite time analysis and macroscopic optimality

    Authors: Sebastien Bubeck, Damien Ernst, Aurelien Garivier

    Abstract: We consider an original problem that arises from the issue of security analysis of a power system and that we name optimal discovery with probabilistic expert advice. We address it with an algorithm based on the optimistic paradigm and on the Good-Turing missing mass estimator. We prove two different regret bounds on the performance of this algorithm under weak assumptions on the probabilistic exp… ▽ More

    Submitted 29 March, 2013; v1 submitted 22 July, 2012; originally announced July 2012.

    Comments: arXiv admin note: substantial text overlap with arXiv:1110.5447

  12. arXiv:1207.5208  [pdf, other

    cs.AI cs.LG stat.ML

    Meta-Learning of Exploration/Exploitation Strategies: The Multi-Armed Bandit Case

    Authors: Francis Maes, Damien Ernst, Louis Wehenkel

    Abstract: The exploration/exploitation (E/E) dilemma arises naturally in many subfields of Science. Multi-armed bandit problems formalize this dilemma in its canonical form. Most current research in this field focuses on generic solutions that can be applied to a wide range of problems. However, in practice, it is often the case that a form of prior information is available about the specific class of targe… ▽ More

    Submitted 22 July, 2012; originally announced July 2012.

    Comments: 16 pages, Springer Selection of papers of ICAART'12

  13. arXiv:1112.4463  [pdf, ps, other

    math.OC stat.ML

    Scenario trees and policy selection for multistage stochastic programming using machine learning

    Authors: Boris Defourny, Damien Ernst, Louis Wehenkel

    Abstract: We propose a hybrid algorithmic strategy for complex stochastic optimization problems, which combines the use of scenario trees from multistage stochastic programming with machine learning techniques for learning a policy in the form of a statistical model, in the context of constrained vector-valued decisions. Such a policy allows one to run out-of-sample simulations over a large number of indepe… ▽ More

    Submitted 19 December, 2011; originally announced December 2011.

    MSC Class: 90C15; 60G15 ACM Class: G.1.6; I.2.6; G.3