-
Human in the Loop Adaptive Optimization for Improved Time Series Forecasting
Authors:
Malik Tiomoko,
Hamza Cherkaoui,
Giuseppe Paolo,
Zhang Yili,
Yu Meng,
Zhang Keli,
Hafiz Tiomoko Ali
Abstract:
Time series forecasting models often produce systematic, predictable errors even in critical domains such as energy, finance, and healthcare. We introduce a novel post training adaptive optimization framework that improves forecast accuracy without retraining or architectural changes. Our method automatically applies expressive transformations optimized via reinforcement learning, contextual bandi…
▽ More
Time series forecasting models often produce systematic, predictable errors even in critical domains such as energy, finance, and healthcare. We introduce a novel post training adaptive optimization framework that improves forecast accuracy without retraining or architectural changes. Our method automatically applies expressive transformations optimized via reinforcement learning, contextual bandits, or genetic algorithms to correct model outputs in a lightweight and model agnostic way. Theoretically, we prove that affine corrections always reduce the mean squared error; practically, we extend this idea with dynamic action based optimization. The framework also supports an optional human in the loop component: domain experts can guide corrections using natural language, which is parsed into actions by a language model. Across multiple benchmarks (e.g., electricity, weather, traffic), we observe consistent accuracy gains with minimal computational overhead. Our interactive demo shows the framework's real time usability. By combining automated post hoc refinement with interpretable and extensible mechanisms, our approach offers a powerful new direction for practical forecasting systems.
△ Less
Submitted 21 May, 2025;
originally announced May 2025.
-
Price Impact of Health Insurance
Authors:
Andrea Di Giovan Paolo,
Jose Higueras
Abstract:
This paper examines the equilibrium effects of insurance contracts on healthcare markets using a mechanism design framework. A population of risk-averse agents with preferences as in Yaari (1987) face the risk of developing an illness of unknown severity, which can be treated in a competitive hospital services market at the prevailing market price. After privately observing their health risk, but…
▽ More
This paper examines the equilibrium effects of insurance contracts on healthcare markets using a mechanism design framework. A population of risk-averse agents with preferences as in Yaari (1987) face the risk of developing an illness of unknown severity, which can be treated in a competitive hospital services market at the prevailing market price. After privately observing their health risk, but before learning their sickness level, agents have the option to purchase insurance from a monopolistic provider. Insurance contracts specify premiums, out-of-pocket costs (OPCs), and hospital service coverage, thus determining demand and price in the downstream hospital market through a market-clearing condition. Our first main result shows that optimal insurance contracts take a simple form: agents can choose between full hospital coverage with a high OPC or restricted coverage with a low OPC. This highlights a novel form of under-insurance (rationing or restricted access to healthcare services) emerging purely due to the insurer's attempt to control his price impact. Our second key result illustrates the nuanced effect of price impact on the amount of insurance provided. Higher healthcare prices increase insurer payouts but also worsen agents' outside options, making them more willing to pay for insurance ex ante. The net effect of these forces determines whether insurance provision exceeds or falls short of a price-taking benchmark.
△ Less
Submitted 3 March, 2025;
originally announced March 2025.
-
TAG: A Decentralized Framework for Multi-Agent Hierarchical Reinforcement Learning
Authors:
Giuseppe Paolo,
Abdelhakim Benechehab,
Hamza Cherkaoui,
Albert Thomas,
Balázs Kégl
Abstract:
Hierarchical organization is fundamental to biological systems and human societies, yet artificial intelligence systems often rely on monolithic architectures that limit adaptability and scalability. Current hierarchical reinforcement learning (HRL) approaches typically restrict hierarchies to two levels or require centralized training, which limits their practical applicability. We introduce TAME…
▽ More
Hierarchical organization is fundamental to biological systems and human societies, yet artificial intelligence systems often rely on monolithic architectures that limit adaptability and scalability. Current hierarchical reinforcement learning (HRL) approaches typically restrict hierarchies to two levels or require centralized training, which limits their practical applicability. We introduce TAME Agent Framework (TAG), a framework for constructing fully decentralized hierarchical multi-agent systems. TAG enables hierarchies of arbitrary depth through a novel LevelEnv concept, which abstracts each hierarchy level as the environment for the agents above it. This approach standardizes information flow between levels while preserving loose coupling, allowing for seamless integration of diverse agent types. We demonstrate the effectiveness of TAG by implementing hierarchical architectures that combine different RL agents across multiple levels, achieving improved performance over classical multi-agent RL baselines on standard benchmarks. Our results show that decentralized hierarchical organization enhances both learning speed and final performance, positioning TAG as a promising direction for scalable multi-agent systems.
△ Less
Submitted 5 March, 2025; v1 submitted 21 February, 2025;
originally announced February 2025.
-
AdaPTS: Adapting Univariate Foundation Models to Probabilistic Multivariate Time Series Forecasting
Authors:
Abdelhakim Benechehab,
Vasilii Feofanov,
Giuseppe Paolo,
Albert Thomas,
Maurizio Filippone,
Balázs Kégl
Abstract:
Pre-trained foundation models (FMs) have shown exceptional performance in univariate time series forecasting tasks. However, several practical challenges persist, including managing intricate dependencies among features and quantifying uncertainty in predictions. This study aims to tackle these critical limitations by introducing adapters; feature-space transformations that facilitate the effectiv…
▽ More
Pre-trained foundation models (FMs) have shown exceptional performance in univariate time series forecasting tasks. However, several practical challenges persist, including managing intricate dependencies among features and quantifying uncertainty in predictions. This study aims to tackle these critical limitations by introducing adapters; feature-space transformations that facilitate the effective use of pre-trained univariate time series FMs for multivariate tasks. Adapters operate by projecting multivariate inputs into a suitable latent space and applying the FM independently to each dimension. Inspired by the literature on representation learning and partially stochastic Bayesian neural networks, we present a range of adapters and optimization/inference strategies. Experiments conducted on both synthetic and real-world datasets confirm the efficacy of adapters, demonstrating substantial enhancements in forecasting accuracy and uncertainty quantification compared to baseline methods. Our framework, AdaPTS, positions adapters as a modular, scalable, and effective solution for leveraging time series FMs in multivariate contexts, thereby promoting their wider adoption in real-world applications. We release the code at https://github.com/abenechehab/AdaPTS.
△ Less
Submitted 14 February, 2025;
originally announced February 2025.
-
Sunrise III: Overview of Observatory and Instruments
Authors:
Andreas Korpi-Lagg,
Achim Gandorfer,
Sami K. Solanki,
Jose Carlos del Toro Iniesta,
Yukio Katsukawa,
Pietro Bernasconi,
Thomas Berkefeld,
Alex Feller,
Tino L. Riethmüller,
Alberto Álvarez-Herrero,
Masahito Kubo,
Valentín Martínez Pillet,
H. N. Smitha,
David Orozco Suárez,
Bianca Grauf,
Michael Carpenter,
Alexander Bell,
María-Teresa Álvarez-Alonso,
Daniel Álvarez García,
Beatriz Aparicio del Moral,
Daniel Ayoub,
Francisco Javier Bailén,
Eduardo Bailón Martínez,
Maria Balaguer Jiménez,
Peter Barthol
, et al. (95 additional authors not shown)
Abstract:
In July 2024, Sunrise completed its third successful science flight. The Sunrise III observatory had been upgraded significantly after the two previous successful flights in 2009 and 2013. Three completely new instruments focus on the small-scale physical processes and their complex interaction from the deepest observable layers in the photosphere up to chromospheric heights. Previously poorly exp…
▽ More
In July 2024, Sunrise completed its third successful science flight. The Sunrise III observatory had been upgraded significantly after the two previous successful flights in 2009 and 2013. Three completely new instruments focus on the small-scale physical processes and their complex interaction from the deepest observable layers in the photosphere up to chromospheric heights. Previously poorly explored spectral regions and lines are exploited to paint a three-dimensional picture of the solar atmosphere with unprecedented completeness and level of detail. The full polarimetric information is captured by all three instruments to reveal the interaction between the magnetic fields and the hydrodynamic processes. Two slit-based spectropolarimeters, the Sunrise UV Spectropolarimeter and Imager (SUSI) and the Sunrise Chromospheric Infrared spectro-Polarimeter (SCIP), focus on the near-ultraviolet and the near-infrared regions respectively, and the imaging spectropolarimeter Tunable Magnetograph (TuMag) simultaneously obtains maps of the full field-of-view of $46 \times 46$ Mm$^2$ in the photosphere and the chromosphere in the visible. The instruments are operated in an orchestrated mode, benefiting from a new Image Stabilization and Light Distribution unit (ISLiD), with the Correlating Wavefront Sensor (CWS) providing the autofocus control and an image stability with a root-mean-square value smaller than 0.005''. A new gondola was constructed to significantly improve the telescope pointing stability, required to achieve uninterrupted observations over many hours. Sunrise III was launched successfully on July 10, 2024, from the Esrange Space Center near Kiruna (Sweden). It reached the landing site between the Mackenzie River and the Great Bear Lake in Canada after a flight duration of 6.5 days. In this paper, we give an overview of the Sunrise III observatory and its instruments.
△ Less
Submitted 30 May, 2025; v1 submitted 10 February, 2025;
originally announced February 2025.
-
Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level
Authors:
Antoine Grosnit,
Alexandre Maraval,
James Doran,
Giuseppe Paolo,
Albert Thomas,
Refinath Shahul Hameed Nabeezath Beevi,
Jonas Gonzalez,
Khyati Khandelwal,
Ignacio Iacobacci,
Abdelhakim Benechehab,
Hamza Cherkaoui,
Youssef Attia El-Hili,
Kun Shao,
Jianye Hao,
Jun Yao,
Balazs Kegl,
Haitham Bou-Ammar,
Jun Wang
Abstract:
We introduce Agent K v1.0, an end-to-end autonomous data science agent designed to automate, optimise, and generalise across diverse data science tasks. Fully automated, Agent K v1.0 manages the entire data science life cycle by learning from experience. It leverages a highly flexible structured reasoning framework to enable it to dynamically process memory in a nested structure, effectively learn…
▽ More
We introduce Agent K v1.0, an end-to-end autonomous data science agent designed to automate, optimise, and generalise across diverse data science tasks. Fully automated, Agent K v1.0 manages the entire data science life cycle by learning from experience. It leverages a highly flexible structured reasoning framework to enable it to dynamically process memory in a nested structure, effectively learning from accumulated experience stored to handle complex reasoning tasks. It optimises long- and short-term memory by selectively storing and retrieving key information, guiding future decisions based on environmental rewards. This iterative approach allows it to refine decisions without fine-tuning or backpropagation, achieving continuous improvement through experiential learning. We evaluate our agent's apabilities using Kaggle competitions as a case study. Following a fully automated protocol, Agent K v1.0 systematically addresses complex and multimodal data science tasks, employing Bayesian optimisation for hyperparameter tuning and feature engineering. Our new evaluation framework rigorously assesses Agent K v1.0's end-to-end capabilities to generate and send submissions starting from a Kaggle competition URL. Results demonstrate that Agent K v1.0 achieves a 92.5\% success rate across tasks, spanning tabular, computer vision, NLP, and multimodal domains. When benchmarking against 5,856 human Kaggle competitors by calculating Elo-MMR scores for each, Agent K v1.0 ranks in the top 38\%, demonstrating an overall skill level comparable to Expert-level users. Notably, its Elo-MMR score falls between the first and third quartiles of scores achieved by human Grandmasters. Furthermore, our results indicate that Agent K v1.0 has reached a performance level equivalent to Kaggle Grandmaster, with a record of 6 gold, 3 silver, and 7 bronze medals, as defined by Kaggle's progression system.
△ Less
Submitted 5 November, 2024;
originally announced November 2024.
-
Zero-shot Model-based Reinforcement Learning using Large Language Models
Authors:
Abdelhakim Benechehab,
Youssef Attia El Hili,
Ambroise Odonnat,
Oussama Zekri,
Albert Thomas,
Giuseppe Paolo,
Maurizio Filippone,
Ievgen Redko,
Balázs Kégl
Abstract:
The emerging zero-shot capabilities of Large Language Models (LLMs) have led to their applications in areas extending well beyond natural language processing tasks. In reinforcement learning, while LLMs have been extensively used in text-based environments, their integration with continuous state spaces remains understudied. In this paper, we investigate how pre-trained LLMs can be leveraged to pr…
▽ More
The emerging zero-shot capabilities of Large Language Models (LLMs) have led to their applications in areas extending well beyond natural language processing tasks. In reinforcement learning, while LLMs have been extensively used in text-based environments, their integration with continuous state spaces remains understudied. In this paper, we investigate how pre-trained LLMs can be leveraged to predict in context the dynamics of continuous Markov decision processes. We identify handling multivariate data and incorporating the control signal as key challenges that limit the potential of LLMs' deployment in this setup and propose Disentangled In-Context Learning (DICL) to address them. We present proof-of-concept applications in two reinforcement learning settings: model-based policy evaluation and data-augmented off-policy reinforcement learning, supported by theoretical analysis of the proposed methods. Our experiments further demonstrate that our approach produces well-calibrated uncertainty estimates. We release the code at https://github.com/abenechehab/dicl.
△ Less
Submitted 13 February, 2025; v1 submitted 15 October, 2024;
originally announced October 2024.
-
SAMformer: Unlocking the Potential of Transformers in Time Series Forecasting with Sharpness-Aware Minimization and Channel-Wise Attention
Authors:
Romain Ilbert,
Ambroise Odonnat,
Vasilii Feofanov,
Aladin Virmaux,
Giuseppe Paolo,
Themis Palpanas,
Ievgen Redko
Abstract:
Transformer-based architectures achieved breakthrough performance in natural language processing and computer vision, yet they remain inferior to simpler linear baselines in multivariate long-term forecasting. To better understand this phenomenon, we start by studying a toy linear forecasting problem for which we show that transformers are incapable of converging to their true solution despite the…
▽ More
Transformer-based architectures achieved breakthrough performance in natural language processing and computer vision, yet they remain inferior to simpler linear baselines in multivariate long-term forecasting. To better understand this phenomenon, we start by studying a toy linear forecasting problem for which we show that transformers are incapable of converging to their true solution despite their high expressive power. We further identify the attention of transformers as being responsible for this low generalization capacity. Building upon this insight, we propose a shallow lightweight transformer model that successfully escapes bad local minima when optimized with sharpness-aware optimization. We empirically demonstrate that this result extends to all commonly used real-world multivariate time series datasets. In particular, SAMformer surpasses current state-of-the-art methods and is on par with the biggest foundation model MOIRAI while having significantly fewer parameters. The code is available at https://github.com/romilbert/samformer.
△ Less
Submitted 3 June, 2024; v1 submitted 15 February, 2024;
originally announced February 2024.
-
A call for embodied AI
Authors:
Giuseppe Paolo,
Jonas Gonzalez-Billandon,
Balázs Kégl
Abstract:
We propose Embodied AI as the next fundamental step in the pursuit of Artificial General Intelligence, juxtaposing it against current AI advancements, particularly Large Language Models. We traverse the evolution of the embodiment concept across diverse fields - philosophy, psychology, neuroscience, and robotics - to highlight how EAI distinguishes itself from the classical paradigm of static lear…
▽ More
We propose Embodied AI as the next fundamental step in the pursuit of Artificial General Intelligence, juxtaposing it against current AI advancements, particularly Large Language Models. We traverse the evolution of the embodiment concept across diverse fields - philosophy, psychology, neuroscience, and robotics - to highlight how EAI distinguishes itself from the classical paradigm of static learning. By broadening the scope of Embodied AI, we introduce a theoretical framework based on cognitive architectures, emphasizing perception, action, memory, and learning as essential components of an embodied agent. This framework is aligned with Friston's active inference principle, offering a comprehensive approach to EAI development. Despite the progress made in the field of AI, substantial challenges, such as the formulation of a novel AI learning theory and the innovation of advanced hardware, persist. Our discussion lays down a foundational guideline for future Embodied AI research. Highlighting the importance of creating Embodied AI agents capable of seamless communication, collaboration, and coexistence with humans and other intelligent entities within real-world environments, we aim to steer the AI community towards addressing the multifaceted challenges and seizing the opportunities that lie ahead in the quest for AGI.
△ Less
Submitted 13 September, 2024; v1 submitted 6 February, 2024;
originally announced February 2024.
-
A Multi-step Loss Function for Robust Learning of the Dynamics in Model-based Reinforcement Learning
Authors:
Abdelhakim Benechehab,
Albert Thomas,
Giuseppe Paolo,
Maurizio Filippone,
Balázs Kégl
Abstract:
In model-based reinforcement learning, most algorithms rely on simulating trajectories from one-step models of the dynamics learned on data. A critical challenge of this approach is the compounding of one-step prediction errors as the length of the trajectory grows. In this paper we tackle this issue by using a multi-step objective to train one-step models. Our objective is a weighted sum of the m…
▽ More
In model-based reinforcement learning, most algorithms rely on simulating trajectories from one-step models of the dynamics learned on data. A critical challenge of this approach is the compounding of one-step prediction errors as the length of the trajectory grows. In this paper we tackle this issue by using a multi-step objective to train one-step models. Our objective is a weighted sum of the mean squared error (MSE) loss at various future horizons. We find that this new loss is particularly useful when the data is noisy (additive Gaussian noise in the observations), which is often the case in real-life environments. To support the multi-step loss, first we study its properties in two tractable cases: i) uni-dimensional linear system, and ii) two-parameter non-linear system. Second, we show in a variety of tasks (environments or datasets) that the models learned with this loss achieve a significant improvement in terms of the averaged R2-score on future prediction horizons. Finally, in the pure batch reinforcement learning setting, we demonstrate that one-step models serve as strong baselines when dynamics are deterministic, while multi-step models would be more advantageous in the presence of noise, highlighting the potential of our approach in real-world applications.
△ Less
Submitted 5 February, 2024;
originally announced February 2024.
-
Comparison analysis between standard polysomnographic data and in-ear-EEG signals: A preliminary study
Authors:
Gianpaolo Palo,
Luigi Fiorillo,
Giuliana Monachino,
Michal Bechny,
Michel Walti,
Elias Meier,
Francesca Pentimalli Biscaretti di Ruffia,
Mark Melnykowycz,
Athina Tzovara,
Valentina Agostini,
Francesca Dalia Faraci
Abstract:
Study Objectives: Polysomnography (PSG) currently serves as the benchmark for evaluating sleep disorders. Its discomfort makes long-term monitoring unfeasible, leading to bias in sleep quality assessment. Hence, less invasive, cost-effective, and portable alternatives need to be explored. One promising contender is the in-ear-EEG sensor. This study aims to establish a methodology to assess the sim…
▽ More
Study Objectives: Polysomnography (PSG) currently serves as the benchmark for evaluating sleep disorders. Its discomfort makes long-term monitoring unfeasible, leading to bias in sleep quality assessment. Hence, less invasive, cost-effective, and portable alternatives need to be explored. One promising contender is the in-ear-EEG sensor. This study aims to establish a methodology to assess the similarity between the single-channel in-ear-EEG and standard PSG derivations.
Methods: The study involves four-hour signals recorded from ten healthy subjects aged 18 to 60 years. Recordings are analyzed following two complementary approaches: (i) a hypnogram-based analysis aimed at assessing the agreement between PSG and in-ear-EEG-derived hypnograms; and (ii) a feature-based analysis based on time- and frequency- domain feature extraction, unsupervised feature selection, and definition of Feature-based Similarity Index via Jensen-Shannon Divergence (JSD-FSI).
Results: We find large variability between PSG and in-ear-EEG hypnograms scored by the same sleep expert according to Cohen's kappa metric, with significantly greater agreements for PSG scorers than for in-ear-EEG scorers (p < 0.001) based on Fleiss' kappa metric. On average, we demonstrate a high similarity between PSG and in-ear-EEG signals in terms of JSD-FSI (0.79 +/- 0.06 -awake, 0.77 +/- 0.07 -NREM, and 0.67 +/- 0.10 -REM) and in line with the similarity values computed independently on standard PSG-channel-combinations.
Conclusions: In-ear-EEG is a valuable solution for home-based sleep monitoring, however further studies with a larger and more heterogeneous dataset are needed.
△ Less
Submitted 6 August, 2024; v1 submitted 18 January, 2024;
originally announced January 2024.
-
Multi-timestep models for Model-based Reinforcement Learning
Authors:
Abdelhakim Benechehab,
Giuseppe Paolo,
Albert Thomas,
Maurizio Filippone,
Balázs Kégl
Abstract:
In model-based reinforcement learning (MBRL), most algorithms rely on simulating trajectories from one-step dynamics models learned on data. A critical challenge of this approach is the compounding of one-step prediction errors as length of the trajectory grows. In this paper we tackle this issue by using a multi-timestep objective to train one-step models. Our objective is a weighted sum of a los…
▽ More
In model-based reinforcement learning (MBRL), most algorithms rely on simulating trajectories from one-step dynamics models learned on data. A critical challenge of this approach is the compounding of one-step prediction errors as length of the trajectory grows. In this paper we tackle this issue by using a multi-timestep objective to train one-step models. Our objective is a weighted sum of a loss function (e.g., negative log-likelihood) at various future horizons. We explore and test a range of weights profiles. We find that exponentially decaying weights lead to models that significantly improve the long-horizon R2 score. This improvement is particularly noticeable when the models were evaluated on noisy data. Finally, using a soft actor-critic (SAC) agent in pure batch reinforcement learning (RL) and iterated batch RL scenarios, we found that our multi-timestep models outperform or match standard one-step models. This was especially evident in a noisy variant of the considered environment, highlighting the potential of our approach in real-world applications.
△ Less
Submitted 11 October, 2023; v1 submitted 9 October, 2023;
originally announced October 2023.
-
Guided Safe Shooting: model based reinforcement learning with safety constraints
Authors:
Giuseppe Paolo,
Jonas Gonzalez-Billandon,
Albert Thomas,
Balázs Kégl
Abstract:
In the last decade, reinforcement learning successfully solved complex control tasks and decision-making problems, like the Go board game. Yet, there are few success stories when it comes to deploying those algorithms to real-world scenarios. One of the reasons is the lack of guarantees when dealing with and avoiding unsafe states, a fundamental requirement in critical control engineering systems.…
▽ More
In the last decade, reinforcement learning successfully solved complex control tasks and decision-making problems, like the Go board game. Yet, there are few success stories when it comes to deploying those algorithms to real-world scenarios. One of the reasons is the lack of guarantees when dealing with and avoiding unsafe states, a fundamental requirement in critical control engineering systems. In this paper, we introduce Guided Safe Shooting (GuSS), a model-based RL approach that can learn to control systems with minimal violations of the safety constraints. The model is learned on the data collected during the operation of the system in an iterated batch fashion, and is then used to plan for the best action to perform at each time step. We propose three different safe planners, one based on a simple random shooting strategy and two based on MAP-Elites, a more advanced divergent-search algorithm. Experiments show that these planners help the learning agent avoid unsafe situations while maximally exploring the state space, a necessary aspect when learning an accurate model of the system. Furthermore, compared to model-free approaches, learning a model allows GuSS reducing the number of interactions with the real-system while still reaching high rewards, a fundamental requirement when handling engineering systems.
△ Less
Submitted 12 September, 2024; v1 submitted 20 June, 2022;
originally announced June 2022.
-
Learning in Sparse Rewards settings through Quality-Diversity algorithms
Authors:
Giuseppe Paolo
Abstract:
In the Reinforcement Learning (RL) framework, the learning is guided through a reward signal. This means that in situations of sparse rewards the agent has to focus on exploration, in order to discover which action, or set of actions leads to the reward. RL agents usually struggle with this. Exploration is the focus of Quality-Diversity (QD) methods. In this thesis, we approach the problem of spar…
▽ More
In the Reinforcement Learning (RL) framework, the learning is guided through a reward signal. This means that in situations of sparse rewards the agent has to focus on exploration, in order to discover which action, or set of actions leads to the reward. RL agents usually struggle with this. Exploration is the focus of Quality-Diversity (QD) methods. In this thesis, we approach the problem of sparse rewards with these algorithms, and in particular with Novelty Search (NS). This is a method that only focuses on the diversity of the possible policies behaviors. The first part of the thesis focuses on learning a representation of the space in which the diversity of the policies is evaluated. In this regard, we propose the TAXONS algorithm, a method that learns a low-dimensional representation of the search space through an AutoEncoder. While effective, TAXONS still requires information on when to capture the observation used to learn said space. For this, we study multiple ways, and in particular the signature transform, to encode information about the whole trajectory of observations. The thesis continues with the introduction of the SERENE algorithm, a method that can efficiently focus on the interesting parts of the search space. This method separates the exploration of the search space from the exploitation of the reward through a two-alternating-steps approach. The exploration is performed through NS. Any discovered reward is then locally exploited through emitters. The third and final contribution combines TAXONS and SERENE into a single approach: STAX. Throughout this thesis, we introduce methods that lower the amount of prior information needed in sparse rewards settings. These contributions are a promising step towards the development of methods that can autonomously explore and find high-performance policies in a variety of sparse rewards settings.
△ Less
Submitted 2 March, 2022;
originally announced March 2022.
-
Discovering and Exploiting Sparse Rewards in a Learned Behavior Space
Authors:
Giuseppe Paolo,
Miranda Coninx,
Alban Laflaquière,
Stephane Doncieux
Abstract:
Learning optimal policies in sparse rewards settings is difficult as the learning agent has little to no feedback on the quality of its actions. In these situations, a good strategy is to focus on exploration, hopefully leading to the discovery of a reward signal to improve on. A learning algorithm capable of dealing with this kind of settings has to be able to (1) explore possible agent behaviors…
▽ More
Learning optimal policies in sparse rewards settings is difficult as the learning agent has little to no feedback on the quality of its actions. In these situations, a good strategy is to focus on exploration, hopefully leading to the discovery of a reward signal to improve on. A learning algorithm capable of dealing with this kind of settings has to be able to (1) explore possible agent behaviors and (2) exploit any possible discovered reward. Efficient exploration algorithms have been proposed that require to define a behavior space, that associates to an agent its resulting behavior in a space that is known to be worth exploring. The need to define this space is a limitation of these algorithms. In this work, we introduce STAX, an algorithm designed to learn a behavior space on-the-fly and to explore it while efficiently optimizing any reward discovered. It does so by separating the exploration and learning of the behavior space from the exploitation of the reward through an alternating two-steps process. In the first step, STAX builds a repertoire of diverse policies while learning a low-dimensional representation of the high-dimensional observations generated during the policies evaluation. In the exploitation step, emitters are used to optimize the performance of the discovered rewarding solutions. Experiments conducted on three different sparse reward environments show that STAX performs comparably to existing baselines while requiring much less prior information about the task as it autonomously builds the behavior space.
△ Less
Submitted 26 September, 2023; v1 submitted 2 November, 2021;
originally announced November 2021.
-
Sparse Reward Exploration via Novelty Search and Emitters
Authors:
Giuseppe Paolo,
Alexandre Coninx,
Stephane Doncieux,
Alban Laflaquière
Abstract:
Reward-based optimization algorithms require both exploration, to find rewards, and exploitation, to maximize performance. The need for efficient exploration is even more significant in sparse reward settings, in which performance feedback is given sparingly, thus rendering it unsuitable for guiding the search process. In this work, we introduce the SparsE Reward Exploration via Novelty and Emitte…
▽ More
Reward-based optimization algorithms require both exploration, to find rewards, and exploitation, to maximize performance. The need for efficient exploration is even more significant in sparse reward settings, in which performance feedback is given sparingly, thus rendering it unsuitable for guiding the search process. In this work, we introduce the SparsE Reward Exploration via Novelty and Emitters (SERENE) algorithm, capable of efficiently exploring a search space, as well as optimizing rewards found in potentially disparate areas. Contrary to existing emitters-based approaches, SERENE separates the search space exploration and reward exploitation into two alternating processes. The first process performs exploration through Novelty Search, a divergent search algorithm. The second one exploits discovered reward areas through emitters, i.e. local instances of population-based optimization algorithms. A meta-scheduler allocates a global computational budget by alternating between the two processes, ensuring the discovery and efficient exploitation of disjoint reward areas. SERENE returns both a collection of diverse solutions covering the search space and a collection of high-performing solutions for each distinct reward area. We evaluate SERENE on various sparse reward environments and show it compares favorably to existing baselines.
△ Less
Submitted 16 April, 2021; v1 submitted 5 February, 2021;
originally announced February 2021.
-
Novelty Search makes Evolvability Inevitable
Authors:
Stephane Doncieux,
Giuseppe Paolo,
Alban Laflaquière,
Alexandre Coninx
Abstract:
Evolvability is an important feature that impacts the ability of evolutionary processes to find interesting novel solutions and to deal with changing conditions of the problem to solve. The estimation of evolvability is not straightforward and is generally too expensive to be directly used as selective pressure in the evolutionary process. Indirectly promoting evolvability as a side effect of othe…
▽ More
Evolvability is an important feature that impacts the ability of evolutionary processes to find interesting novel solutions and to deal with changing conditions of the problem to solve. The estimation of evolvability is not straightforward and is generally too expensive to be directly used as selective pressure in the evolutionary process. Indirectly promoting evolvability as a side effect of other easier and faster to compute selection pressures would thus be advantageous. In an unbounded behavior space, it has already been shown that evolvable individuals naturally appear and tend to be selected as they are more likely to invade empty behavior niches. Evolvability is thus a natural byproduct of the search in this context. However, practical agents and environments often impose limits on the reach-able behavior space. How do these boundaries impact evolvability? In this context, can evolvability still be promoted without explicitly rewarding it? We show that Novelty Search implicitly creates a pressure for high evolvability even in bounded behavior spaces, and explore the reasons for such a behavior. More precisely we show that, throughout the search, the dynamic evaluation of novelty rewards individuals which are very mobile in the behavior space, which in turn promotes evolvability.
△ Less
Submitted 13 May, 2020;
originally announced May 2020.
-
Unsupervised Learning and Exploration of Reachable Outcome Space
Authors:
Giuseppe Paolo,
Alban Laflaquière,
Alexandre Coninx,
Stephane Doncieux
Abstract:
Performing Reinforcement Learning in sparse rewards settings, with very little prior knowledge, is a challenging problem since there is no signal to properly guide the learning process. In such situations, a good search strategy is fundamental. At the same time, not having to adapt the algorithm to every single problem is very desirable. Here we introduce TAXONS, a Task Agnostic eXploration of Out…
▽ More
Performing Reinforcement Learning in sparse rewards settings, with very little prior knowledge, is a challenging problem since there is no signal to properly guide the learning process. In such situations, a good search strategy is fundamental. At the same time, not having to adapt the algorithm to every single problem is very desirable. Here we introduce TAXONS, a Task Agnostic eXploration of Outcome spaces through Novelty and Surprise algorithm. Based on a population-based divergent-search approach, it learns a set of diverse policies directly from high-dimensional observations, without any task-specific information. TAXONS builds a repertoire of policies while training an autoencoder on the high-dimensional observation of the final state of the system to build a low-dimensional outcome space. The learned outcome space, combined with the reconstruction error, is used to drive the search for new policies. Results show that TAXONS can find a diverse set of controllers, covering a good part of the ground-truth outcome space, while having no information about such space.
△ Less
Submitted 4 May, 2020; v1 submitted 12 September, 2019;
originally announced September 2019.
-
A Critical-like Collective State Leads to Long-range Cell Communication in Dictyostelium discoideum Aggregation
Authors:
Giovanna De Palo,
Darvin Yi,
Robert G. Endres
Abstract:
The transition from single-cell to multicellular behavior is important in early development but rarely studied. The starvation-induced aggregation of the social amoeba Dictyostelium discoideum into a multicellular slug is known to result from single-cell chemotaxis towards emitted pulses of cyclic adenosine monophosphate (cAMP). However, how exactly do transient short-range chemical gradients lead…
▽ More
The transition from single-cell to multicellular behavior is important in early development but rarely studied. The starvation-induced aggregation of the social amoeba Dictyostelium discoideum into a multicellular slug is known to result from single-cell chemotaxis towards emitted pulses of cyclic adenosine monophosphate (cAMP). However, how exactly do transient short-range chemical gradients lead to coherent collective movement at a macroscopic scale? Here, we use a multiscale model verified by quantitative microscopy to describe wide-ranging behaviors from chemotaxis and excitability of individual cells to aggregation of thousands of cells. To better understand the mechanism of long-range cell-cell communication and hence aggregation, we analyze cell-cell correlations, showing evidence for self-organization at the onset of aggregation (as opposed to following a leader cell). Surprisingly, cell collectives, despite their finite size, show features of criticality known from phase transitions in physical systems. Application of external cAMP perturbations in our simulations near the sensitive critical point allows steering cells into early aggregation and towards certain locations but not once an aggregation center has been chosen.
△ Less
Submitted 11 January, 2018;
originally announced January 2018.
-
A Data-driven Model for Interaction-aware Pedestrian Motion Prediction in Object Cluttered Environments
Authors:
Mark Pfeiffer,
Giuseppe Paolo,
Hannes Sommer,
Juan Nieto,
Roland Siegwart,
Cesar Cadena
Abstract:
This paper reports on a data-driven, interaction-aware motion prediction approach for pedestrians in environments cluttered with static obstacles. When navigating in such workspaces shared with humans, robots need accurate motion predictions of the surrounding pedestrians. Human navigation behavior is mostly influenced by their surrounding pedestrians and by the static obstacles in their vicinity.…
▽ More
This paper reports on a data-driven, interaction-aware motion prediction approach for pedestrians in environments cluttered with static obstacles. When navigating in such workspaces shared with humans, robots need accurate motion predictions of the surrounding pedestrians. Human navigation behavior is mostly influenced by their surrounding pedestrians and by the static obstacles in their vicinity. In this paper we introduce a new model based on Long-Short Term Memory (LSTM) neural networks, which is able to learn human motion behavior from demonstrated data. To the best of our knowledge, this is the first approach using LSTMs, that incorporates both static obstacles and surrounding pedestrians for trajectory forecasting. As part of the model, we introduce a new way of encoding surrounding pedestrians based on a 1d-grid in polar angle space. We evaluate the benefit of interaction-aware motion prediction and the added value of incorporating static obstacles on both simulation and real-world datasets by comparing with state-of-the-art approaches. The results show, that our new approach outperforms the other approaches while being very computationally efficient and that taking into account static obstacles for motion predictions significantly improves the prediction accuracy, especially in cluttered environments.
△ Less
Submitted 26 February, 2018; v1 submitted 25 September, 2017;
originally announced September 2017.
-
Towards continuous control of flippers for a multi-terrain robot using deep reinforcement learning
Authors:
Giuseppe Paolo,
Lei Tai,
Ming Liu
Abstract:
In this paper we focus on developing a control algorithm for multi-terrain tracked robots with flippers using a reinforcement learning (RL) approach. The work is based on the deep deterministic policy gradient (DDPG) algorithm, proven to be very successful in simple simulation environments. The algorithm works in an end-to-end fashion in order to control the continuous position of the flippers. Th…
▽ More
In this paper we focus on developing a control algorithm for multi-terrain tracked robots with flippers using a reinforcement learning (RL) approach. The work is based on the deep deterministic policy gradient (DDPG) algorithm, proven to be very successful in simple simulation environments. The algorithm works in an end-to-end fashion in order to control the continuous position of the flippers. This end-to-end approach makes it easy to apply the controller to a wide array of circumstances, but the huge flexibility comes to the cost of an increased difficulty of solution. The complexity of the task is enlarged even more by the fact that real multi-terrain robots move in partially observable environments. Notwithstanding these complications, being able to smoothly control a multi-terrain robot can produce huge benefits in impaired people daily lives or in search and rescue situations.
△ Less
Submitted 25 September, 2017;
originally announced September 2017.
-
Virtual-to-real Deep Reinforcement Learning: Continuous Control of Mobile Robots for Mapless Navigation
Authors:
Lei Tai,
Giuseppe Paolo,
Ming Liu
Abstract:
We present a learning-based mapless motion planner by taking the sparse 10-dimensional range findings and the target position with respect to the mobile robot coordinate frame as input and the continuous steering commands as output. Traditional motion planners for mobile ground robots with a laser range sensor mostly depend on the obstacle map of the navigation environment where both the highly pr…
▽ More
We present a learning-based mapless motion planner by taking the sparse 10-dimensional range findings and the target position with respect to the mobile robot coordinate frame as input and the continuous steering commands as output. Traditional motion planners for mobile ground robots with a laser range sensor mostly depend on the obstacle map of the navigation environment where both the highly precise laser sensor and the obstacle map building work of the environment are indispensable. We show that, through an asynchronous deep reinforcement learning method, a mapless motion planner can be trained end-to-end without any manually designed features and prior demonstrations. The trained planner can be directly applied in unseen virtual and real environments. The experiments show that the proposed mapless motion planner can navigate the nonholonomic mobile robot to the desired targets without colliding with any obstacles.
△ Less
Submitted 21 July, 2017; v1 submitted 1 March, 2017;
originally announced March 2017.
-
Unraveling Adaptation in Eukaryotic Pathways: Lessons from Protocells
Authors:
Giovanna De Palo,
Robert G. Endres
Abstract:
Eukaryotic adaptation pathways operate within wide-ranging environmental conditions without stimulus saturation. Despite numerous differences in the adaptation mechanisms employed by bacteria and eukaryotes, all require energy consumption. Here, we present two minimal models showing that expenditure of energy by the cell is not essential for adaptation. Both models share important features with la…
▽ More
Eukaryotic adaptation pathways operate within wide-ranging environmental conditions without stimulus saturation. Despite numerous differences in the adaptation mechanisms employed by bacteria and eukaryotes, all require energy consumption. Here, we present two minimal models showing that expenditure of energy by the cell is not essential for adaptation. Both models share important features with large eukaryotic cells: they employ small diffusible molecules and involve receptor subunits resembling highly conserved G-protein cascades. Analyzing the drawbacks of these models helps us understand the benefits of energy consumption, in terms of adjustability of response and adaptation times as well as separation of cell-external sensing and cell-internal signaling. Our work thus sheds new light on the evolution of adaptation mechanisms in complex systems.
△ Less
Submitted 13 September, 2013;
originally announced September 2013.
-
Properties of a family of n reggeized gluon states in multicolour QCD
Authors:
Vacca Gian Paolo
Abstract:
A general relation between families of (n+1) gluon and n gluon eigenstates of the BKP evolution kernels in the multicolour limit of QCD is derived. It allows to construct an (n+1) gluon eigenstate if an n gluon eigenstate is known; this solution is Bose symmetric and thus physical for even n. A recently found family of odderon solutions corresponds to the particular case n=2.
A general relation between families of (n+1) gluon and n gluon eigenstates of the BKP evolution kernels in the multicolour limit of QCD is derived. It allows to construct an (n+1) gluon eigenstate if an n gluon eigenstate is known; this solution is Bose symmetric and thus physical for even n. A recently found family of odderon solutions corresponds to the particular case n=2.
△ Less
Submitted 7 July, 2000;
originally announced July 2000.