Skip to main content

Showing 1–20 of 20 results for author: Ramponi, G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.07054  [pdf, ps, other

    cs.LG cs.AI

    Policy Gradient with Tree Search: Avoiding Local Optimas through Lookahead

    Authors: Uri Koren, Navdeep Kumar, Uri Gadot, Giorgia Ramponi, Kfir Yehuda Levy, Shie Mannor

    Abstract: Classical policy gradient (PG) methods in reinforcement learning frequently converge to suboptimal local optima, a challenge exacerbated in large or complex environments. This work investigates Policy Gradient with Tree Search (PGTS), an approach that integrates an $m$-step lookahead mechanism to enhance policy optimization. We provide theoretical analysis demonstrating that increasing the tree se… ▽ More

    Submitted 8 June, 2025; originally announced June 2025.

  2. arXiv:2505.17610  [pdf, ps, other

    cs.LG

    Learning Equilibria from Data: Provably Efficient Multi-Agent Imitation Learning

    Authors: Till Freihaut, Luca Viano, Volkan Cevher, Matthieu Geist, Giorgia Ramponi

    Abstract: This paper provides the first expert sample complexity characterization for learning a Nash equilibrium from expert data in Markov Games. We show that a new quantity named the single policy deviation concentrability coefficient is unavoidable in the non-interactive imitation learning setting, and we provide an upper bound for behavioral cloning (BC) featuring such coefficient. BC exhibits substant… ▽ More

    Submitted 23 May, 2025; originally announced May 2025.

  3. arXiv:2503.02735  [pdf, other

    cs.LG

    Clustered KL-barycenter design for policy evaluation

    Authors: Simon Weissmann, Till Freihaut, Claire Vernade, Giorgia Ramponi, Leif Döring

    Abstract: In the context of stochastic bandit models, this article examines how to design sample-efficient behavior policies for the importance sampling evaluation of multiple target policies. From importance sampling theory, it is well established that sample efficiency is highly sensitive to the KL divergence between the target and importance sampling distributions. We first analyze a single behavior poli… ▽ More

    Submitted 4 March, 2025; originally announced March 2025.

  4. arXiv:2502.09432  [pdf, other

    cs.AI cs.LG

    Dual Formulation for Non-Rectangular Lp Robust Markov Decision Processes

    Authors: Navdeep Kumar, Adarsh Gupta, Maxence Mohamed Elfatihi, Giorgia Ramponi, Kfir Yehuda Levy, Shie Mannor

    Abstract: We study robust Markov decision processes (RMDPs) with non-rectangular uncertainty sets, which capture interdependencies across states unlike traditional rectangular models. While non-rectangular robust policy evaluation is generally NP-hard, even in approximation, we identify a powerful class of $L_p$-bounded uncertainty sets that avoid these complexity barriers due to their structural simplicity… ▽ More

    Submitted 13 February, 2025; originally announced February 2025.

  5. arXiv:2411.15046  [pdf, other

    cs.LG

    On Feasible Rewards in Multi-Agent Inverse Reinforcement Learning

    Authors: Till Freihaut, Giorgia Ramponi

    Abstract: In multi-agent systems, agent behavior is driven by utility functions that encapsulate their individual goals and interactions. Inverse Reinforcement Learning (IRL) seeks to uncover these utilities by analyzing expert behavior, offering insights into the underlying decision-making processes. However, multi-agent settings pose significant challenges, particularly when rewards are inferred from equi… ▽ More

    Submitted 17 February, 2025; v1 submitted 22 November, 2024; originally announced November 2024.

    Comments: Currently under review

  6. arXiv:2410.18871  [pdf, other

    cs.GT cs.MA

    Learning Collusion in Episodic, Inventory-Constrained Markets

    Authors: Paul Friedrich, Barna Pásztor, Giorgia Ramponi

    Abstract: Pricing algorithms have demonstrated the capability to learn tacit collusion that is largely unaddressed by current regulations. Their increasing use in markets, including oligopolistic industries with a history of collusion, calls for closer examination by competition authorities. In this paper, we extend the study of tacit collusion in learning algorithms from basic pricing games to more complex… ▽ More

    Submitted 25 February, 2025; v1 submitted 24 October, 2024; originally announced October 2024.

    Comments: 34 pages, 18 figures. To appear at the AAMAS 2025 conference

  7. arXiv:2410.08868  [pdf, ps, other

    cs.LG stat.ML

    On the Convergence of Single-Timescale Actor-Critic

    Authors: Navdeep Kumar, Priyank Agrawal, Giorgia Ramponi, Kfir Yehuda Levy, Shie Mannor

    Abstract: We analyze the global convergence of the single-timescale actor-critic (AC) algorithm for the infinite-horizon discounted Markov Decision Processes (MDPs) with finite state spaces. To this end, we introduce an elegant analytical framework for handling complex, coupled recursions inherent in the algorithm. Leveraging this framework, we establish that the algorithm converges to an $ε$-close \textbf{… ▽ More

    Submitted 4 June, 2025; v1 submitted 11 October, 2024; originally announced October 2024.

    Comments: updated version , 27 pages

  8. arXiv:2406.18450  [pdf, other

    cs.LG cs.AI

    Preference Elicitation for Offline Reinforcement Learning

    Authors: Alizée Pace, Bernhard Schölkopf, Gunnar Rätsch, Giorgia Ramponi

    Abstract: Applying reinforcement learning (RL) to real-world problems is often made challenging by the inability to interact with the environment and the difficulty of designing reward functions. Offline RL addresses the first challenge by considering access to an offline dataset of environment interactions labeled by the reward function. In contrast, Preference-based RL does not assume access to the reward… ▽ More

    Submitted 28 February, 2025; v1 submitted 26 June, 2024; originally announced June 2024.

    Comments: ICLR 2025

  9. arXiv:2406.01575  [pdf, other

    math.OC cs.AI cs.LG stat.ML

    Contextual Bilevel Reinforcement Learning for Incentive Alignment

    Authors: Vinzenz Thoma, Barna Pasztor, Andreas Krause, Giorgia Ramponi, Yifan Hu

    Abstract: The optimal policy in various real-world strategic decision-making problems depends both on the environmental configuration and exogenous events. For these settings, we introduce Contextual Bilevel Reinforcement Learning (CB-RL), a stochastic bilevel decision-making model, where the lower level consists of solving a contextual Markov Decision Process (CMDP). CB-RL can be viewed as a Stackelberg Ga… ▽ More

    Submitted 8 December, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

    Comments: 60 pages, 21 Figures

  10. arXiv:2402.15776  [pdf, other

    cs.LG stat.ML

    Truly No-Regret Learning in Constrained MDPs

    Authors: Adrian Müller, Pragnya Alatur, Volkan Cevher, Giorgia Ramponi, Niao He

    Abstract: Constrained Markov decision processes (CMDPs) are a common way to model safety constraints in reinforcement learning. State-of-the-art methods for efficiently solving CMDPs are based on primal-dual algorithms. For these algorithms, all currently known regret bounds allow for error cancellations -- one can compensate for a constraint violation in one round with a strict constraint satisfaction in a… ▽ More

    Submitted 19 July, 2024; v1 submitted 24 February, 2024; originally announced February 2024.

  11. arXiv:2310.07518  [pdf, other

    cs.LG

    Exploiting Causal Graph Priors with Posterior Sampling for Reinforcement Learning

    Authors: Mirco Mutti, Riccardo De Santi, Marcello Restelli, Alexander Marx, Giorgia Ramponi

    Abstract: Posterior sampling allows exploitation of prior knowledge on the environment's transition dynamics to improve the sample efficiency of reinforcement learning. The prior is typically specified as a class of parametric distributions, the design of which can be cumbersome in practice, often resulting in the choice of uninformative priors. In this work, we propose a novel posterior sampling approach i… ▽ More

    Submitted 8 April, 2024; v1 submitted 11 October, 2023; originally announced October 2023.

  12. arXiv:2306.14799  [pdf, other

    cs.LG cs.GT

    On Imitation in Mean-field Games

    Authors: Giorgia Ramponi, Pavel Kolev, Olivier Pietquin, Niao He, Mathieu Laurière, Matthieu Geist

    Abstract: We explore the problem of imitation learning (IL) in the context of mean-field games (MFGs), where the goal is to imitate the behavior of a population of agents following a Nash equilibrium policy according to some unknown payoff function. IL in MFGs presents new challenges compared to single-agent IL, particularly when both the reward function and the transition kernel depend on the population di… ▽ More

    Submitted 26 June, 2023; originally announced June 2023.

  13. arXiv:2306.07749  [pdf, other

    cs.LG cs.GT cs.MA

    Provably Learning Nash Policies in Constrained Markov Potential Games

    Authors: Pragnya Alatur, Giorgia Ramponi, Niao He, Andreas Krause

    Abstract: Multi-agent reinforcement learning (MARL) addresses sequential decision-making problems with multiple agents, where each agent optimizes its own objective. In many real-world instances, the agents may not only want to optimize their objectives, but also ensure safe behavior. For example, in traffic routing, each car (agent) aims to reach its destination quickly (objective) while avoiding collision… ▽ More

    Submitted 13 June, 2023; originally announced June 2023.

    Comments: 30 pages

  14. arXiv:2306.07001  [pdf, ps, other

    cs.LG stat.ML

    Cancellation-Free Regret Bounds for Lagrangian Approaches in Constrained Markov Decision Processes

    Authors: Adrian Müller, Pragnya Alatur, Giorgia Ramponi, Niao He

    Abstract: Constrained Markov Decision Processes (CMDPs) are one of the common ways to model safe reinforcement learning problems, where constraint functions model the safety objectives. Lagrangian-based dual or primal-dual algorithms provide efficient methods for learning in CMDPs. For these algorithms, the currently known regret bounds in the finite-horizon setting allow for a "cancellation of errors"; one… ▽ More

    Submitted 30 August, 2023; v1 submitted 12 June, 2023; originally announced June 2023.

  15. arXiv:2210.11137  [pdf, ps, other

    cs.LG cs.AI eess.SY

    Trust Region Policy Optimization with Optimal Transport Discrepancies: Duality and Algorithm for Continuous Actions

    Authors: Antonio Terpin, Nicolas Lanzetti, Batuhan Yardim, Florian Dörfler, Giorgia Ramponi

    Abstract: Policy Optimization (PO) algorithms have been proven particularly suited to handle the high-dimensionality of real-world continuous control tasks. In this context, Trust Region Policy Optimization methods represent a popular approach to stabilize the policy updates. These usually rely on the Kullback-Leibler (KL) divergence to limit the change in the policy. The Wasserstein distance represents a n… ▽ More

    Submitted 20 October, 2022; originally announced October 2022.

    Comments: Accepted for presentation at, and publication in the proceedings of, the 36th Conference on Neural Information Processing Systems (NeurIPS 2022)

  16. arXiv:2210.04817  [pdf, ps, other

    cs.LG cs.CR

    Do you pay for Privacy in Online learning?

    Authors: Amartya Sanyal, Giorgia Ramponi

    Abstract: Online learning, in the mistake bound model, is one of the most fundamental concepts in learning theory. Differential privacy, instead, is the most widely used statistical concept of privacy in the machine learning community. It is thus clear that defining learning problems that are online differentially privately learnable is of great interest. In this paper, we pose the question on if the two pr… ▽ More

    Submitted 10 October, 2022; originally announced October 2022.

    Comments: This is an updated version with i) clearer problem statements especially in proposed Theorem 1 and ii) clearer discussion of existing work especially Golowich and Livni (2021). Conference on Learning Theory. PMLR, 2022

  17. arXiv:2207.08645  [pdf, other

    cs.LG cs.AI stat.ML

    Active Exploration for Inverse Reinforcement Learning

    Authors: David Lindner, Andreas Krause, Giorgia Ramponi

    Abstract: Inverse Reinforcement Learning (IRL) is a powerful paradigm for inferring a reward function from expert demonstrations. Many IRL algorithms require a known transition model and sometimes even a known expert policy, or they at least require access to a generative model. However, these assumptions are too strong for many real-world applications, where the environment can be accessed only through seq… ▽ More

    Submitted 22 August, 2023; v1 submitted 18 July, 2022; originally announced July 2022.

    Comments: Presented at Conference on Neural Information Processing Systems (NeurIPS), 2022

  18. arXiv:2007.07812  [pdf, other

    cs.LG stat.ML

    Inverse Reinforcement Learning from a Gradient-based Learner

    Authors: Giorgia Ramponi, Gianluca Drappo, Marcello Restelli

    Abstract: Inverse Reinforcement Learning addresses the problem of inferring an expert's reward function from demonstrations. However, in many applications, we not only have access to the expert's near-optimal behavior, but we also observe part of her learning process. In this paper, we propose a new algorithm for this setting, in which the goal is to recover the reward function being optimized by an agent,… ▽ More

    Submitted 15 July, 2020; originally announced July 2020.

    Journal ref: Advances in Neural Information Processing Systems 33 (2020) 2458--2468

  19. arXiv:2007.07804  [pdf, other

    cs.LG stat.ML

    Newton Optimization on Helmholtz Decomposition for Continuous Games

    Authors: Giorgia Ramponi, Marcello Restelli

    Abstract: Many learning problems involve multiple agents optimizing different interactive functions. In these problems, the standard policy gradient algorithms fail due to the non-stationarity of the setting and the different interests of each agent. In fact, algorithms must take into account the complex dynamics of these systems to guarantee rapid convergence towards a (local) Nash equilibrium. In this pap… ▽ More

    Submitted 2 September, 2021; v1 submitted 15 July, 2020; originally announced July 2020.

    Comments: In 35th AAAI Conference on Artificial Intelligence (AAAI 2021)

    Journal ref: Proceedings of the AAAI Conference on Artificial Intelligence 35 (2021) 11325-11333

  20. arXiv:1811.08295  [pdf, other

    cs.LG stat.ML

    T-CGAN: Conditional Generative Adversarial Network for Data Augmentation in Noisy Time Series with Irregular Sampling

    Authors: Giorgia Ramponi, Pavlos Protopapas, Marco Brambilla, Ryan Janssen

    Abstract: In this paper we propose a data augmentation method for time series with irregular sampling, Time-Conditional Generative Adversarial Network (T-CGAN). Our approach is based on Conditional Generative Adversarial Networks (CGAN), where the generative step is implemented by a deconvolutional NN and the discriminative step by a convolutional NN. Both the generator and the discriminator are conditioned… ▽ More

    Submitted 1 February, 2019; v1 submitted 20 November, 2018; originally announced November 2018.