Skip to main content

Showing 1–8 of 8 results for author: Bazerque, J A

Searching in archive eess. Search in all archives.
.
  1. arXiv:2502.20462  [pdf, other

    eess.SY

    Cooperative Multi-Agent Assignment over Stochastic Graphs via Constrained Reinforcement Learning

    Authors: Leopoldo Agorio, Sean Van Alen, Santiago Paternain, Miguel Calvo-Fullana, Juan Andres Bazerque

    Abstract: Constrained multi-agent reinforcement learning offers the framework to design scalable and almost surely feasible solutions for teams of agents operating in dynamic environments to carry out conflicting tasks. We address the challenges of multi-agent coordination through an unconventional formulation in which the dual variables are not driven to convergence but are free to cycle, enabling agents t… ▽ More

    Submitted 27 February, 2025; originally announced February 2025.

    Comments: 15 pages, 5 figures, submitted to IEEE Transactions on Automatic Control

  2. arXiv:2406.01782  [pdf, other

    eess.SY cs.AI cs.LG cs.MA

    Multi-agent assignment via state augmented reinforcement learning

    Authors: Leopoldo Agorio, Sean Van Alen, Miguel Calvo-Fullana, Santiago Paternain, Juan Andres Bazerque

    Abstract: We address the conflicting requirements of a multi-agent assignment problem through constrained reinforcement learning, emphasizing the inadequacy of standard regularization techniques for this purpose. Instead, we recur to a state augmentation approach in which the oscillation of dual variables is exploited by agents to alternate between tasks. In addition, we coordinate the actions of the multip… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: 12 pages, 3 figures, 6th Annual Conference on Learning for Dynamics and Control

    MSC Class: 93E35

    Journal ref: Proceedings of Machine Learning Research vol 242 1 12, 2024. 6th Annual Conference on Learning for Dynamics and Control

  3. arXiv:2401.12849  [pdf, other

    cs.LG eess.SY

    Learning safety critics via a non-contractive binary bellman operator

    Authors: Agustin Castellano, Hancheng Min, Juan Andrés Bazerque, Enrique Mallada

    Abstract: The inability to naturally enforce safety in Reinforcement Learning (RL), with limited failures, is a core challenge impeding its use in real-world applications. One notion of safety of vast practical relevance is the ability to avoid (unsafe) regions of the state space. Though such a safety goal can be captured by an action-value-like function, a.k.a. safety critics, the associated operator lacks… ▽ More

    Submitted 23 January, 2024; originally announced January 2024.

  4. arXiv:2010.12993  [pdf, other

    cs.LG eess.SP stat.ML

    Multi-task Supervised Learning via Cross-learning

    Authors: Juan Cervino, Juan Andres Bazerque, Miguel Calvo-Fullana, Alejandro Ribeiro

    Abstract: In this paper we consider a problem known as multi-task learning, consisting of fitting a set of classifier or regression functions intended for solving different tasks. In our novel formulation, we couple the parameters of these functions, so that they learn in their task specific domains while staying close to each other. This facilitates cross-fertilization in which data collected across differ… ▽ More

    Submitted 26 May, 2021; v1 submitted 24 October, 2020; originally announced October 2020.

  5. arXiv:2010.08443  [pdf, other

    cs.LG eess.SY

    Policy Gradient for Continuing Tasks in Non-stationary Markov Decision Processes

    Authors: Santiago Paternain, Juan Andres Bazerque, Alejandro Ribeiro

    Abstract: Reinforcement learning considers the problem of finding policies that maximize an expected cumulative reward in a Markov decision process with unknown transition probabilities. In this paper we consider the problem of finding optimal policies assuming that they belong to a reproducing kernel Hilbert space (RKHS). To that end we compute unbiased stochastic gradients of the value function which we u… ▽ More

    Submitted 16 October, 2020; originally announced October 2020.

  6. arXiv:2010.02122  [pdf, other

    eess.SY math.OC

    Quadratic approximate dynamic programming for scheduling water resources: a case study

    Authors: Agustin Castellano, Camila Martínez, Pablo Monzón, Juan Andrés Bazerque, Andrés Ferragut, Fernando Paganini

    Abstract: We address the problem of scheduling water resources in a power system via approximate dynamic programming.To this goal, we model a finite horizon economic dispatch problemwith convex stage cost and affine dynamics, and consider aquadratic approximation of the value functions. Evaluating theachieved policy entails solving a quadratic program at each timestep, while value function fitting can be ca… ▽ More

    Submitted 5 October, 2020; originally announced October 2020.

  7. Multi-task Reinforcement Learning in Reproducing Kernel Hilbert Spaces via Cross-learning

    Authors: Juan Cervino, Juan Andres Bazerque, Miguel Calvo-Fullana, Alejandro Ribeiro

    Abstract: Reinforcement learning (RL) is a framework to optimize a control policy using rewards that are revealed by the system as a response to a control action. In its standard form, RL involves a single agent that uses its policy to accomplish a specific task. These methods require large amounts of reward samples to achieve good performance, and may not generalize well when the task is modified, even if… ▽ More

    Submitted 26 August, 2020; originally announced August 2020.

  8. arXiv:1807.11274  [pdf, other

    eess.SY

    Stochastic Policy Gradient Ascent in Reproducing Kernel Hilbert Spaces

    Authors: Santiago Paternain, Juan Andrés Bazerque, Austin Small, Alejandro Ribeiro

    Abstract: Reinforcement learning consists of finding policies that maximize an expected cumulative long-term reward in a Markov decision process with unknown transition probabilities and instantaneous rewards. In this paper, we consider the problem of finding such optimal policies while assuming they are continuous functions belonging to a reproducing kernel Hilbert space (RKHS). To learn the optimal policy… ▽ More

    Submitted 30 July, 2018; originally announced July 2018.