Skip to main content

Showing 1–24 of 24 results for author: Tirinzoni, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2504.11054  [pdf, other

    cs.LG

    Zero-Shot Whole-Body Humanoid Control via Behavioral Foundation Models

    Authors: Andrea Tirinzoni, Ahmed Touati, Jesse Farebrother, Mateusz Guzek, Anssi Kanervisto, Yingchen Xu, Alessandro Lazaric, Matteo Pirotta

    Abstract: Unsupervised reinforcement learning (RL) aims at pre-training agents that can solve a wide range of downstream tasks in complex environments. Despite recent advancements, existing approaches suffer from several limitations: they may require running an RL process on each downstream task to achieve a satisfactory performance, they may need access to datasets with good coverage or well-curated task-s… ▽ More

    Submitted 15 April, 2025; originally announced April 2025.

    Comments: Published at ICLR 2025

  2. arXiv:2504.07896  [pdf, other

    cs.LG cs.AI cs.RO

    Fast Adaptation with Behavioral Foundation Models

    Authors: Harshit Sikchi, Andrea Tirinzoni, Ahmed Touati, Yingchen Xu, Anssi Kanervisto, Scott Niekum, Amy Zhang, Alessandro Lazaric, Matteo Pirotta

    Abstract: Unsupervised zero-shot reinforcement learning (RL) has emerged as a powerful paradigm for pretraining behavioral foundation models (BFMs), enabling agents to solve a wide range of downstream tasks specified via reward functions in a zero-shot fashion, i.e., without additional test-time learning or planning. This is achieved by learning self-supervised task embeddings alongside corresponding near-o… ▽ More

    Submitted 10 April, 2025; originally announced April 2025.

    Comments: 25 pages

  3. arXiv:2503.09817  [pdf, other

    cs.LG cs.AI stat.ML

    Temporal Difference Flows

    Authors: Jesse Farebrother, Matteo Pirotta, Andrea Tirinzoni, Rémi Munos, Alessandro Lazaric, Ahmed Touati

    Abstract: Predictive models of the future are fundamental for an agent's ability to reason and plan. A common strategy learns a world model and unrolls it step-by-step at inference, where small errors can rapidly compound. Geometric Horizon Models (GHMs) offer a compelling alternative by directly making predictions of future states, avoiding cumulative inference errors. While GHMs can be conveniently learne… ▽ More

    Submitted 12 March, 2025; originally announced March 2025.

  4. arXiv:2403.13097  [pdf, other

    cs.LG cs.AI

    Simple Ingredients for Offline Reinforcement Learning

    Authors: Edoardo Cetin, Andrea Tirinzoni, Matteo Pirotta, Alessandro Lazaric, Yann Ollivier, Ahmed Touati

    Abstract: Offline reinforcement learning algorithms have proven effective on datasets highly connected to the target downstream task. Yet, leveraging a novel testbed (MOOD) in which trajectories come from heterogeneous sources, we show that existing methods struggle with diverse data: their performance considerably deteriorates as data collected for related but different tasks is simply added to the offline… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  5. arXiv:2311.05638  [pdf, ps, other

    stat.ML cs.LG

    Towards Instance-Optimality in Online PAC Reinforcement Learning

    Authors: Aymen Al-Marjani, Andrea Tirinzoni, Emilie Kaufmann

    Abstract: Several recent works have proposed instance-dependent upper bounds on the number of episodes needed to identify, with probability $1-δ$, an $\varepsilon$-optimal policy in finite-horizon tabular Markov Decision Processes (MDPs). These upper bounds feature various complexity measures for the MDP, which are defined based on different notions of sub-optimality gaps. However, as of now, no lower bound… ▽ More

    Submitted 31 October, 2023; originally announced November 2023.

  6. arXiv:2306.13601  [pdf, other

    cs.LG

    Active Coverage for PAC Reinforcement Learning

    Authors: Aymen Al-Marjani, Andrea Tirinzoni, Emilie Kaufmann

    Abstract: Collecting and leveraging data with good coverage properties plays a crucial role in different aspects of reinforcement learning (RL), including reward-free exploration and offline learning. However, the notion of "good coverage" really depends on the application at hand, as data suitable for one context may not be so for another. In this paper, we formalize the problem of active coverage in episo… ▽ More

    Submitted 23 June, 2023; originally announced June 2023.

    Comments: Accepted at COLT 2023

  7. arXiv:2302.03789  [pdf, ps, other

    cs.LG

    Layered State Discovery for Incremental Autonomous Exploration

    Authors: Liyu Chen, Andrea Tirinzoni, Alessandro Lazaric, Matteo Pirotta

    Abstract: We study the autonomous exploration (AX) problem proposed by Lim & Auer (2012). In this setting, the objective is to discover a set of $ε$-optimal policies reaching a set $\mathcal{S}_L^{\rightarrow}$ of incrementally $L$-controllable states. We introduce a novel layered decomposition of the set of incrementally $L$-controllable states that is based on the iterative application of a state-expansio… ▽ More

    Submitted 7 February, 2023; originally announced February 2023.

  8. arXiv:2212.09429  [pdf, ps, other

    cs.LG stat.ML

    On the Complexity of Representation Learning in Contextual Linear Bandits

    Authors: Andrea Tirinzoni, Matteo Pirotta, Alessandro Lazaric

    Abstract: In contextual linear bandits, the reward function is assumed to be a linear combination of an unknown reward vector and a given embedding of context-arm pairs. In practice, the embedding is often learned at the same time as the reward vector, thus leading to an online representation learning problem. Existing approaches to representation learning in contextual bandits are either very generic (e.g.… ▽ More

    Submitted 19 December, 2022; originally announced December 2022.

  9. arXiv:2210.13083  [pdf, other

    cs.LG

    Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees

    Authors: Andrea Tirinzoni, Matteo Papini, Ahmed Touati, Alessandro Lazaric, Matteo Pirotta

    Abstract: We study the problem of representation learning in stochastic contextual linear bandits. While the primary concern in this domain is usually to find realizable representations (i.e., those that allow predicting the reward function at any context-action pair exactly), it has been recently shown that representations with certain spectral properties (called HLS) may be more effective for the explorat… ▽ More

    Submitted 24 October, 2022; originally announced October 2022.

    Comments: Accepted at Neurips 2022

  10. arXiv:2210.04946  [pdf, ps, other

    cs.LG cs.AI stat.ML

    Reaching Goals is Hard: Settling the Sample Complexity of the Stochastic Shortest Path

    Authors: Liyu Chen, Andrea Tirinzoni, Matteo Pirotta, Alessandro Lazaric

    Abstract: We study the sample complexity of learning an $ε$-optimal policy in the Stochastic Shortest Path (SSP) problem. We first derive sample complexity bounds when the learner has access to a generative model. We show that there exists a worst-case SSP instance with $S$ states, $A$ actions, minimum cost $c_{\min}$, and maximum expected cost of the optimal policy over all states $B_{\star}$, where any al… ▽ More

    Submitted 10 October, 2022; originally announced October 2022.

  11. arXiv:2207.05852  [pdf, other

    cs.LG

    Optimistic PAC Reinforcement Learning: the Instance-Dependent View

    Authors: Andrea Tirinzoni, Aymen Al-Marjani, Emilie Kaufmann

    Abstract: Optimistic algorithms have been extensively studied for regret minimization in episodic tabular MDPs, both from a minimax and an instance-dependent view. However, for the PAC RL problem, where the goal is to identify a near-optimal policy with high probability, little is known about their instance-dependent sample complexity. A negative result of Wagenmaker et al. (2021) suggests that optimistic s… ▽ More

    Submitted 12 July, 2022; originally announced July 2022.

    Comments: arXiv admin note: text overlap with arXiv:2203.09251

  12. arXiv:2205.10936  [pdf, other

    cs.LG stat.ML

    On Elimination Strategies for Bandit Fixed-Confidence Identification

    Authors: Andrea Tirinzoni, Rémy Degenne

    Abstract: Elimination algorithms for bandit identification, which prune the plausible correct answers sequentially until only one remains, are computationally convenient since they reduce the problem size over time. However, existing elimination strategies are often not fully adaptive (they update their sampling rule infrequently) and are not easy to extend to combinatorial settings, where the set of answer… ▽ More

    Submitted 24 October, 2022; v1 submitted 22 May, 2022; originally announced May 2022.

  13. arXiv:2203.09251  [pdf, other

    cs.LG stat.ML

    Near Instance-Optimal PAC Reinforcement Learning for Deterministic MDPs

    Authors: Andrea Tirinzoni, Aymen Al-Marjani, Emilie Kaufmann

    Abstract: In probably approximately correct (PAC) reinforcement learning (RL), an agent is required to identify an $ε$-optimal policy with probability $1-δ$. While minimax optimal algorithms exist for this problem, its instance-dependent complexity remains elusive in episodic Markov decision processes (MDPs). In this paper, we propose the first nearly matching (up to a horizon squared factor and logarithmic… ▽ More

    Submitted 24 October, 2022; v1 submitted 17 March, 2022; originally announced March 2022.

  14. arXiv:2111.01479  [pdf, other

    cs.AI cs.LG math.ST

    Dealing With Misspecification In Fixed-Confidence Linear Top-m Identification

    Authors: Clémence Réda, Andrea Tirinzoni, Rémy Degenne

    Abstract: We study the problem of the identification of m arms with largest means under a fixed error rate $δ$ (fixed-confidence Top-m identification), for misspecified linear bandit models. This problem is motivated by practical applications, especially in medicine and recommendation systems, where linear models are popular due to their simplicity and the existence of efficient algorithms, but in which dat… ▽ More

    Submitted 2 November, 2021; originally announced November 2021.

    Comments: Virtual conference

  15. arXiv:2110.14798  [pdf, other

    cs.LG

    Reinforcement Learning in Linear MDPs: Constant Regret and Representation Selection

    Authors: Matteo Papini, Andrea Tirinzoni, Aldo Pacchiano, Marcello Restelli, Alessandro Lazaric, Matteo Pirotta

    Abstract: We study the role of the representation of state-action value functions in regret minimization in finite-horizon Markov Decision Processes (MDPs) with linear structure. We first derive a necessary condition on the representation, called universally spanning optimal features (UNISOFT), to achieve constant regret in any MDP with linear reward function. This result encompasses the well-known settings… ▽ More

    Submitted 27 October, 2021; originally announced October 2021.

    Comments: Accepted at NeurIPS 2021

  16. arXiv:2106.13013  [pdf, ps, other

    cs.LG

    A Fully Problem-Dependent Regret Lower Bound for Finite-Horizon MDPs

    Authors: Andrea Tirinzoni, Matteo Pirotta, Alessandro Lazaric

    Abstract: We derive a novel asymptotic problem-dependent lower-bound for regret minimization in finite-horizon tabular Markov Decision Processes (MDPs). While, similar to prior work (e.g., for ergodic MDPs), the lower-bound is the solution to an optimization problem, our derivation reveals the need for an additional constraint on the visitation distribution over state-action pairs that explicitly accounts f… ▽ More

    Submitted 24 June, 2021; originally announced June 2021.

  17. arXiv:2105.08834  [pdf, other

    cs.LG

    Meta-Reinforcement Learning by Tracking Task Non-stationarity

    Authors: Riccardo Poiani, Andrea Tirinzoni, Marcello Restelli

    Abstract: Many real-world domains are subject to a structured non-stationarity which affects the agent's goals and the environmental dynamics. Meta-reinforcement learning (RL) has been shown successful for training agents that quickly adapt to related tasks. However, most of the existing meta-RL algorithms for non-stationary domains either make strong assumptions on the task generation process or require sa… ▽ More

    Submitted 18 May, 2021; originally announced May 2021.

    Comments: To appear at IJCAI 2021

  18. arXiv:2104.03781  [pdf, other

    cs.LG

    Leveraging Good Representations in Linear Contextual Bandits

    Authors: Matteo Papini, Andrea Tirinzoni, Marcello Restelli, Alessandro Lazaric, Matteo Pirotta

    Abstract: The linear contextual bandit literature is mostly focused on the design of efficient learning algorithms for a given representation. However, a contextual bandit problem may admit multiple linear representations, each one with different characteristics that directly impact the regret of the learning algorithm. In particular, recent works showed that there exist "good" representations for which con… ▽ More

    Submitted 8 April, 2021; originally announced April 2021.

  19. arXiv:2010.12247  [pdf, other

    cs.LG stat.ML

    An Asymptotically Optimal Primal-Dual Incremental Algorithm for Contextual Linear Bandits

    Authors: Andrea Tirinzoni, Matteo Pirotta, Marcello Restelli, Alessandro Lazaric

    Abstract: In the contextual linear bandit setting, algorithms built on the optimism principle fail to exploit the structure of the problem and have been shown to be asymptotically suboptimal. In this paper, we follow recent approaches of deriving asymptotically optimal algorithms from problem-dependent regret lower bounds and we introduce a novel algorithm improving over the state-of-the-art along multiple… ▽ More

    Submitted 20 November, 2020; v1 submitted 23 October, 2020; originally announced October 2020.

    Comments: To appear at NeurIPS 2020. V2: clarified dependencies in the worst-case regret bound

  20. arXiv:2007.00722  [pdf, other

    cs.LG stat.ML

    Sequential Transfer in Reinforcement Learning with a Generative Model

    Authors: Andrea Tirinzoni, Riccardo Poiani, Marcello Restelli

    Abstract: We are interested in how to design reinforcement learning agents that provably reduce the sample complexity for learning new tasks by transferring knowledge from previously-solved ones. The availability of solutions to related problems poses a fundamental trade-off: whether to seek policies that are expected to achieve high (yet sub-optimal) performance in the new task immediately or whether to se… ▽ More

    Submitted 1 July, 2020; originally announced July 2020.

    Comments: ICML 2020

  21. arXiv:2005.11593  [pdf, other

    cs.LG stat.ML

    A Novel Confidence-Based Algorithm for Structured Bandits

    Authors: Andrea Tirinzoni, Alessandro Lazaric, Marcello Restelli

    Abstract: We study finite-armed stochastic bandits where the rewards of each arm might be correlated to those of other arms. We introduce a novel phased algorithm that exploits the given structure to build confidence sets over the parameters of the true bandit problem and rapidly discard all sub-optimal arms. In particular, unlike standard bandit algorithms with no structure, we show that the number of time… ▽ More

    Submitted 23 May, 2020; originally announced May 2020.

    Comments: AISTATS 2020

  22. arXiv:1909.04115  [pdf, other

    cs.LG cs.AI stat.ML

    Gradient-Aware Model-based Policy Search

    Authors: Pierluca D'Oro, Alberto Maria Metelli, Andrea Tirinzoni, Matteo Papini, Marcello Restelli

    Abstract: Traditional model-based reinforcement learning approaches learn a model of the environment dynamics without explicitly considering how it will be used by the agent. In the presence of misspecified model classes, this can lead to poor estimates, as some relevant available information is ignored. In this paper, we introduce a novel model-based policy search approach that exploits the knowledge of th… ▽ More

    Submitted 20 November, 2019; v1 submitted 9 September, 2019; originally announced September 2019.

  23. arXiv:1907.07384  [pdf, other

    cs.LG stat.ML

    Feature Selection via Mutual Information: New Theoretical Insights

    Authors: Mario Beraha, Alberto Maria Metelli, Matteo Papini, Andrea Tirinzoni, Marcello Restelli

    Abstract: Mutual information has been successfully adopted in filter feature-selection methods to assess both the relevancy of a subset of features in predicting the target variable and the redundancy with respect to other variables. However, existing algorithms are mostly heuristic and do not offer any guarantee on the proposed solution. In this paper, we provide novel theoretical results showing that cond… ▽ More

    Submitted 17 July, 2019; originally announced July 2019.

    Comments: Accepted for presentation at the International Joint Conference on Neural Networks (IJCNN) 2019

  24. arXiv:1805.10886  [pdf, other

    cs.LG stat.ML

    Importance Weighted Transfer of Samples in Reinforcement Learning

    Authors: Andrea Tirinzoni, Andrea Sessa, Matteo Pirotta, Marcello Restelli

    Abstract: We consider the transfer of experience samples (i.e., tuples < s, a, s', r >) in reinforcement learning (RL), collected from a set of source tasks to improve the learning process in a given target task. Most of the related approaches focus on selecting the most relevant source samples for solving the target task, but then all the transferred samples are used without considering anymore the discrep… ▽ More

    Submitted 28 May, 2018; originally announced May 2018.

    Comments: Accepted at ICML 2018