-
Offline Risk-sensitive RL with Partial Observability to Enhance Performance in Human-Robot Teaming
Authors:
Giorgio Angelotti,
Caroline P. C. Chanel,
Adam H. M. Pinto,
Christophe Lounis,
Corentin Chauffaut,
Nicolas Drougard
Abstract:
The integration of physiological computing into mixed-initiative human-robot interaction systems offers valuable advantages in autonomous task allocation by incorporating real-time features as human state observations into the decision-making system. This approach may alleviate the cognitive load on human operators by intelligently allocating mission tasks between agents. Nevertheless, accommodati…
▽ More
The integration of physiological computing into mixed-initiative human-robot interaction systems offers valuable advantages in autonomous task allocation by incorporating real-time features as human state observations into the decision-making system. This approach may alleviate the cognitive load on human operators by intelligently allocating mission tasks between agents. Nevertheless, accommodating a diverse pool of human participants with varying physiological and behavioral measurements presents a substantial challenge. To address this, resorting to a probabilistic framework becomes necessary, given the inherent uncertainty and partial observability on the human's state. Recent research suggests to learn a Partially Observable Markov Decision Process (POMDP) model from a data set of previously collected experiences that can be solved using Offline Reinforcement Learning (ORL) methods. In the present work, we not only highlight the potential of partially observable representations and physiological measurements to improve human operator state estimation and performance, but also enhance the overall mission effectiveness of a human-robot team. Importantly, as the fixed data set may not contain enough information to fully represent complex stochastic processes, we propose a method to incorporate model uncertainty, thus enabling risk-sensitive sequential decision-making. Experiments were conducted with a group of twenty-six human participants within a simulated robot teleoperation environment, yielding empirical evidence of the method's efficacy. The obtained adaptive task allocation policy led to statistically significant higher scores than the one that was used to collect the data set, allowing for generalization across diverse participants also taking into account risk-sensitive metrics.
△ Less
Submitted 8 February, 2024;
originally announced February 2024.
-
Modular zk-Rollup On-Demand
Authors:
Thomas Lavaur,
Jonathan Detchart,
Jérôme Lacan,
Caroline P. C. Chanel
Abstract:
The rapid expansion of the use of blockchain-based systems often leads to a choice between customizable private blockchains and more secure, scalable and decentralized but expensive public blockchains. This choice represents the trade-off between privacy and customization at a low cost and security, scalability, and a large user base but at a high cost. In order to improve the scalability of secur…
▽ More
The rapid expansion of the use of blockchain-based systems often leads to a choice between customizable private blockchains and more secure, scalable and decentralized but expensive public blockchains. This choice represents the trade-off between privacy and customization at a low cost and security, scalability, and a large user base but at a high cost. In order to improve the scalability of secure public blockchains while enabling privacy and cost reduction, zk-rollups, a layer 2 solution, appear to be a promising avenue. This paper explores the benefits of zk-rollups, including improved privacy, as well as their potential to support transactions designed for specific applications. We propose an innovative design that allows multiple zk-rollups to co-exist on the same smart contracts, simplifying their creation and customization. We then evaluate the first implementation of our system highlighting a low overhead on existing transaction types and on proof generation while strongly decreasing the cost of new transaction types and drastically reducing zk-rollup creation costs.
△ Less
Submitted 5 June, 2023;
originally announced June 2023.
-
Data Augmentation through Expert-guided Symmetry Detection to Improve Performance in Offline Reinforcement Learning
Authors:
Giorgio Angelotti,
Nicolas Drougard,
Caroline P. C. Chanel
Abstract:
Offline estimation of the dynamical model of a Markov Decision Process (MDP) is a non-trivial task that greatly depends on the data available in the learning phase. Sometimes the dynamics of the model is invariant with respect to some transformations of the current state and action. Recent works showed that an expert-guided pipeline relying on Density Estimation methods as Deep Neural Network base…
▽ More
Offline estimation of the dynamical model of a Markov Decision Process (MDP) is a non-trivial task that greatly depends on the data available in the learning phase. Sometimes the dynamics of the model is invariant with respect to some transformations of the current state and action. Recent works showed that an expert-guided pipeline relying on Density Estimation methods as Deep Neural Network based Normalizing Flows effectively detects this structure in deterministic environments, both categorical and continuous-valued. The acquired knowledge can be exploited to augment the original data set, leading eventually to a reduction in the distributional shift between the true and the learned model. Such data augmentation technique can be exploited as a preliminary process to be executed before adopting an Offline Reinforcement Learning architecture, increasing its performance. In this work we extend the paradigm to also tackle non-deterministic MDPs, in particular, 1) we propose a detection threshold in categorical environments based on statistical distances, and 2) we show that the former results lead to a performance improvement when solving the learned MDP and then applying the optimized policy in the real environment.
△ Less
Submitted 12 April, 2023; v1 submitted 18 December, 2021;
originally announced December 2021.
-
Expert-Guided Symmetry Detection in Markov Decision Processes
Authors:
Giorgio Angelotti,
Nicolas Drougard,
Caroline P. C. Chanel
Abstract:
Learning a Markov Decision Process (MDP) from a fixed batch of trajectories is a non-trivial task whose outcome's quality depends on both the amount and the diversity of the sampled regions of the state-action space. Yet, many MDPs are endowed with invariant reward and transition functions with respect to some transformations of the current state and action. Being able to detect and exploit these…
▽ More
Learning a Markov Decision Process (MDP) from a fixed batch of trajectories is a non-trivial task whose outcome's quality depends on both the amount and the diversity of the sampled regions of the state-action space. Yet, many MDPs are endowed with invariant reward and transition functions with respect to some transformations of the current state and action. Being able to detect and exploit these structures could benefit not only the learning of the MDP but also the computation of its subsequent optimal control policy. In this work we propose a paradigm, based on Density Estimation methods, that aims to detect the presence of some already supposed transformations of the state-action space for which the MDP dynamics is invariant. We tested the proposed approach in a discrete toroidal grid environment and in two notorious environments of OpenAI's Gym Learning Suite. The results demonstrate that the model distributional shift is reduced when the dataset is augmented with the data obtained by using the detected symmetries, allowing for a more thorough and data-efficient learning of the transition functions.
△ Less
Submitted 19 November, 2021;
originally announced November 2021.
-
An Offline Risk-aware Policy Selection Method for Bayesian Markov Decision Processes
Authors:
Giorgio Angelotti,
Nicolas Drougard,
Caroline Ponzoni Carvalho Chanel
Abstract:
In Offline Model Learning for Planning and in Offline Reinforcement Learning, the limited data set hinders the estimate of the Value function of the relative Markov Decision Process (MDP). Consequently, the performance of the obtained policy in the real world is bounded and possibly risky, especially when the deployment of a wrong policy can lead to catastrophic consequences. For this reason, seve…
▽ More
In Offline Model Learning for Planning and in Offline Reinforcement Learning, the limited data set hinders the estimate of the Value function of the relative Markov Decision Process (MDP). Consequently, the performance of the obtained policy in the real world is bounded and possibly risky, especially when the deployment of a wrong policy can lead to catastrophic consequences. For this reason, several pathways are being followed with the scope of reducing the model error (or the distributional shift between the learned model and the true one) and, more broadly, obtaining risk-aware solutions with respect to model uncertainty. But when it comes to the final application which baseline should a practitioner choose? In an offline context where computational time is not an issue and robustness is the priority we propose Exploitation vs Caution (EvC), a paradigm that (1) elegantly incorporates model uncertainty abiding by the Bayesian formalism, and (2) selects the policy that maximizes a risk-aware objective over the Bayesian posterior between a fixed set of candidate policies provided, for instance, by the current baselines. We validate EvC with state-of-the-art approaches in different discrete, yet simple, environments offering a fair variety of MDP classes. In the tested scenarios EvC manages to select robust policies and hence stands out as a useful tool for practitioners that aim to apply offline planning and reinforcement learning solvers in the real world.
△ Less
Submitted 11 April, 2023; v1 submitted 27 May, 2021;
originally announced May 2021.
-
Offline Learning for Planning: A Summary
Authors:
Giorgio Angelotti,
Nicolas Drougard,
Caroline Ponzoni Carvalho Chanel
Abstract:
The training of autonomous agents often requires expensive and unsafe trial-and-error interactions with the environment. Nowadays several data sets containing recorded experiences of intelligent agents performing various tasks, spanning from the control of unmanned vehicles to human-robot interaction and medical applications are accessible on the internet. With the intention of limiting the costs…
▽ More
The training of autonomous agents often requires expensive and unsafe trial-and-error interactions with the environment. Nowadays several data sets containing recorded experiences of intelligent agents performing various tasks, spanning from the control of unmanned vehicles to human-robot interaction and medical applications are accessible on the internet. With the intention of limiting the costs of the learning procedure it is convenient to exploit the information that is already available rather than collecting new data. Nevertheless, the incapability to augment the batch can lead the autonomous agents to develop far from optimal behaviours when the sampled experiences do not allow for a good estimate of the true distribution of the environment. Offline learning is the area of machine learning concerned with efficiently obtaining an optimal policy with a batch of previously collected experiences without further interaction with the environment. In this paper we adumbrate the ideas motivating the development of the state-of-the-art offline learning baselines. The listed methods consist in the introduction of epistemic uncertainty dependent constraints during the classical resolution of a Markov Decision Process, with and without function approximators, that aims to alleviate the bad effects of the distributional mismatch between the available samples and real world. We provide comments on the practical utility of the theoretical bounds that justify the application of these algorithms and suggest the utilization of Generative Adversarial Networks to estimate the distributional shift that affects all of the proposed model-free and model-based approaches.
△ Less
Submitted 5 October, 2020;
originally announced October 2020.