Search | arXiv e-print repository

MPX: Mixed Precision Training for JAX

Authors: Alexander Gräfe, Sebastian Trimpe

Abstract: Mixed-precision training has emerged as an indispensable tool for enhancing the efficiency of neural network training in recent years. Concurrently, JAX has grown in popularity as a versatile machine learning toolbox. However, it currently lacks robust support for mixed-precision training. We propose MPX, a mixed-precision training toolbox for JAX that simplifies and accelerates the training of la… ▽ More Mixed-precision training has emerged as an indispensable tool for enhancing the efficiency of neural network training in recent years. Concurrently, JAX has grown in popularity as a versatile machine learning toolbox. However, it currently lacks robust support for mixed-precision training. We propose MPX, a mixed-precision training toolbox for JAX that simplifies and accelerates the training of large-scale neural networks while preserving model accuracy. MPX seamlessly integrates with popular toolboxes such as Equinox and Flax, allowing users to convert full-precision pipelines to mixed-precision versions with minimal modifications. By casting both inputs and outputs to half precision, and introducing a dynamic loss-scaling mechanism, MPX alleviates issues like gradient underflow and overflow that commonly arise in half precision computations. Its design inherits critical features from JAX's type-promotion behavior, ensuring that operations take place in the correct precision and allowing for selective enforcement of full precision where needed (e.g., sums, means, or softmax). MPX further provides wrappers for automatic creation and management of mixed-precision gradients and optimizers, enabling straightforward integration into existing JAX training pipelines. MPX's source code, documentation, and usage examples are available at github.com/Data-Science-in-Mechanical-Engineering/mixed_precision_for_JAX . △ Less

Submitted 8 July, 2025; v1 submitted 4 July, 2025; originally announced July 2025.

arXiv:2506.17994 [pdf, ps, other]

Newtonian and Lagrangian Neural Networks: A Comparison Towards Efficient Inverse Dynamics Identification

Authors: Minh Trinh, Andreas René Geist, Josefine Monnet, Stefan Vilceanu, Sebastian Trimpe, Christian Brecher

Abstract: Accurate inverse dynamics models are essential tools for controlling industrial robots. Recent research combines neural network regression with inverse dynamics formulations of the Newton-Euler and the Euler-Lagrange equations of motion, resulting in so-called Newtonian neural networks and Lagrangian neural networks, respectively. These physics-informed models seek to identify unknowns in the anal… ▽ More Accurate inverse dynamics models are essential tools for controlling industrial robots. Recent research combines neural network regression with inverse dynamics formulations of the Newton-Euler and the Euler-Lagrange equations of motion, resulting in so-called Newtonian neural networks and Lagrangian neural networks, respectively. These physics-informed models seek to identify unknowns in the analytical equations from data. Despite their potential, current literature lacks guidance on choosing between Lagrangian and Newtonian networks. In this study, we show that when motor torques are estimated instead of directly measuring joint torques, Lagrangian networks prove less effective compared to Newtonian networks as they do not explicitly model dissipative torques. The performance of these models is compared to neural network regression on data of a MABI MAX 100 industrial robot. △ Less

Submitted 22 June, 2025; originally announced June 2025.

Comments: Paper accepted for publication in 14th IFAC Symposium on Robotics

ACM Class: I.2.9; I.2.6; I.6.4

arXiv:2506.10871 [pdf, ps, other]

Viability of Future Actions: Robust Safety in Reinforcement Learning via Entropy Regularization

Authors: Pierre-François Massiani, Alexander von Rohr, Lukas Haverbeck, Sebastian Trimpe

Abstract: Despite the many recent advances in reinforcement learning (RL), the question of learning policies that robustly satisfy state constraints under unknown disturbances remains open. In this paper, we offer a new perspective on achieving robust safety by analyzing the interplay between two well-established techniques in model-free RL: entropy regularization, and constraints penalization. We reveal em… ▽ More Despite the many recent advances in reinforcement learning (RL), the question of learning policies that robustly satisfy state constraints under unknown disturbances remains open. In this paper, we offer a new perspective on achieving robust safety by analyzing the interplay between two well-established techniques in model-free RL: entropy regularization, and constraints penalization. We reveal empirically that entropy regularization in constrained RL inherently biases learning toward maximizing the number of future viable actions, thereby promoting constraints satisfaction robust to action noise. Furthermore, we show that by relaxing strict safety constraints through penalties, the constrained RL problem can be approximated arbitrarily closely by an unconstrained one and thus solved using standard model-free RL. This reformulation preserves both safety and optimality while empirically improving resilience to disturbances. Our results indicate that the connection between entropy regularization and robustness is a promising avenue for further empirical and theoretical investigation, as it enables robust safety in RL through simple reward shaping. △ Less

Submitted 12 June, 2025; originally announced June 2025.

Comments: 24 pages, 11 figures, 2 tables. Accepted for publication at ECML-PKDD 2025

arXiv:2506.03898 [pdf, ps, other]

A kernel conditional two-sample test

Authors: Pierre-François Massiani, Christian Fiedler, Lukas Haverbeck, Friedrich Solowjow, Sebastian Trimpe

Abstract: We propose a framework for hypothesis testing on conditional probability distributions, which we then use to construct conditional two-sample statistical tests. These tests identify the inputs -- called covariates in this context -- where two conditional expectations differ with high probability. Our key idea is to transform confidence bounds of a learning method into a conditional two-sample test… ▽ More We propose a framework for hypothesis testing on conditional probability distributions, which we then use to construct conditional two-sample statistical tests. These tests identify the inputs -- called covariates in this context -- where two conditional expectations differ with high probability. Our key idea is to transform confidence bounds of a learning method into a conditional two-sample test, and we instantiate this principle for kernel ridge regression (KRR) and conditional kernel mean embeddings. We generalize existing pointwise-in-time or time-uniform confidence bounds for KRR to previously-inaccessible yet essential cases such as infinite-dimensional outputs with non-trace-class kernels. These bounds enable circumventing the need for independent data in our statistical tests, since they allow online sampling. We also introduce bootstrapping schemes leveraging the parametric form of testing thresholds identified in theory to avoid tuning inaccessible parameters, making our method readily applicable in practice. Such conditional two-sample tests are especially relevant in applications where data arrive sequentially or non-independently, or when output distributions vary with operational parameters. We demonstrate their utility through examples in process monitoring and comparison of dynamical systems. Overall, our results establish a comprehensive foundation for conditional two-sample testing, from theoretical guarantees to practical implementation, and advance the state-of-the-art on the concentration of vector-valued least squares estimation. △ Less

Submitted 4 June, 2025; originally announced June 2025.

Comments: 40 pages, 8 figures, 8 tables. Under review

arXiv:2504.05024 [pdf, other]

Concept Extraction for Time Series with ECLAD-ts

Authors: Antonia Holzapfel, Andres Felipe Posada-Moreno, Sebastian Trimpe

Abstract: Convolutional neural networks (CNNs) for time series classification (TSC) are being increasingly used in applications ranging from quality prediction to medical diagnosis. The black box nature of these models makes understanding their prediction process difficult. This issue is crucial because CNNs are prone to learning shortcuts and biases, compromising their robustness and alignment with human e… ▽ More Convolutional neural networks (CNNs) for time series classification (TSC) are being increasingly used in applications ranging from quality prediction to medical diagnosis. The black box nature of these models makes understanding their prediction process difficult. This issue is crucial because CNNs are prone to learning shortcuts and biases, compromising their robustness and alignment with human expectations. To assess whether such mechanisms are being used and the associated risk, it is essential to provide model explanations that reflect the inner workings of the model. Concept Extraction (CE) methods offer such explanations, but have mostly been developed for the image domain so far, leaving a gap in the time series domain. In this work, we present a CE and localization method tailored to the time series domain, based on the ideas of CE methods for images. We propose the novel method ECLAD-ts, which provides post-hoc global explanations based on how the models encode subsets of the input at different levels of abstraction. For this, concepts are produced by clustering timestep-wise aggregations of CNN activation maps, and their importance is computed based on their impact on the prediction process. We evaluate our method on synthetic and natural datasets. Furthermore, we assess the advantages and limitations of CE in time series through empirical results. Our results show that ECLAD-ts effectively explains models by leveraging their internal representations, providing useful insights about their prediction process. △ Less

Submitted 7 April, 2025; originally announced April 2025.

arXiv:2504.04603 [pdf, other]

Diffusion-Based Approximate MPC: Fast and Consistent Imitation of Multi-Modal Action Distributions

Authors: Pau Marquez Julbe, Julian Nubert, Henrik Hose, Sebastian Trimpe, Katherine J. Kuchenbecker

Abstract: Approximating model predictive control (MPC) using imitation learning (IL) allows for fast control without solving expensive optimization problems online. However, methods that use neural networks in a simple L2-regression setup fail to approximate multi-modal (set-valued) solution distributions caused by local optima found by the numerical solver or non-convex constraints, such as obstacles, sign… ▽ More Approximating model predictive control (MPC) using imitation learning (IL) allows for fast control without solving expensive optimization problems online. However, methods that use neural networks in a simple L2-regression setup fail to approximate multi-modal (set-valued) solution distributions caused by local optima found by the numerical solver or non-convex constraints, such as obstacles, significantly limiting the applicability of approximate MPC in practice. We solve this issue by using diffusion models to accurately represent the complete solution distribution (i.e., all modes) at high control rates (more than 1000 Hz). This work shows that diffusion based AMPC significantly outperforms L2-regression-based approximate MPC for multi-modal action distributions. In contrast to most earlier work on IL, we also focus on running the diffusion-based controller at a higher rate and in joint space instead of end-effector space. Additionally, we propose the use of gradient guidance during the denoising process to consistently pick the same mode in closed loop to prevent switching between solutions. We propose using the cost and constraint satisfaction of the original MPC problem during parallel sampling of solutions from the diffusion model to pick a better mode online. We evaluate our method on the fast and accurate control of a 7-DoF robot manipulator both in simulation and on hardware deployed at 250 Hz, achieving a speedup of more than 70 times compared to solving the MPC problem online and also outperforming the numerical optimization (used for training) in success ratio. Project website: https://paumarquez.github.io/diffusion-ampc. △ Less

Submitted 13 April, 2025; v1 submitted 6 April, 2025; originally announced April 2025.

arXiv:2502.04582 [pdf, other]

The Mini Wheelbot: A Testbed for Learning-based Balancing, Flips, and Articulated Driving

Authors: Henrik Hose, Jan Weisgerber, Sebastian Trimpe

Abstract: The Mini Wheelbot is a balancing, reaction wheel unicycle robot designed as a testbed for learning-based control. It is an unstable system with highly nonlinear yaw dynamics, non-holonomic driving, and discrete contact switches in a small, powerful, and rugged form factor. The Mini Wheelbot can use its wheels to stand up from any initial orientation - enabling automatic environment resets in repet… ▽ More The Mini Wheelbot is a balancing, reaction wheel unicycle robot designed as a testbed for learning-based control. It is an unstable system with highly nonlinear yaw dynamics, non-holonomic driving, and discrete contact switches in a small, powerful, and rugged form factor. The Mini Wheelbot can use its wheels to stand up from any initial orientation - enabling automatic environment resets in repetitive experiments and even challenging half flips. We illustrate the effectiveness of the Mini Wheelbot as a testbed by implementing two popular learning-based control algorithms. First, we showcase Bayesian optimization for tuning the balancing controller. Second, we use imitation learning from an expert nonlinear MPC that uses gyroscopic effects to reorient the robot and can track higher-level velocity and orientation commands. The latter allows the robot to drive around based on user commands - for the first time in this class of robots. The Mini Wheelbot is not only compelling for testing learning-based control algorithms, but it is also just fun to work with, as demonstrated in the video of our experiments. △ Less

Submitted 6 February, 2025; originally announced February 2025.

arXiv:2501.16918 [pdf, other]

On Rollouts in Model-Based Reinforcement Learning

Authors: Bernd Frauenknecht, Devdutt Subhasish, Friedrich Solowjow, Sebastian Trimpe

Abstract: Model-based reinforcement learning (MBRL) seeks to enhance data efficiency by learning a model of the environment and generating synthetic rollouts from it. However, accumulated model errors during these rollouts can distort the data distribution, negatively impacting policy learning and hindering long-term planning. Thus, the accumulation of model errors is a key bottleneck in current MBRL method… ▽ More Model-based reinforcement learning (MBRL) seeks to enhance data efficiency by learning a model of the environment and generating synthetic rollouts from it. However, accumulated model errors during these rollouts can distort the data distribution, negatively impacting policy learning and hindering long-term planning. Thus, the accumulation of model errors is a key bottleneck in current MBRL methods. We propose Infoprop, a model-based rollout mechanism that separates aleatoric from epistemic model uncertainty and reduces the influence of the latter on the data distribution. Further, Infoprop keeps track of accumulated model errors along a model rollout and provides termination criteria to limit data corruption. We demonstrate the capabilities of Infoprop in the Infoprop-Dyna algorithm, reporting state-of-the-art performance in Dyna-style MBRL on common MuJoCo benchmark tasks while substantially increasing rollout length and data quality. △ Less

Submitted 8 April, 2025; v1 submitted 28 January, 2025; originally announced January 2025.

arXiv:2501.07985 [pdf, ps, other]

CHEQ-ing the Box: Safe Variable Impedance Learning for Robotic Polishing

Authors: Emma Cramer, Lukas Jäschke, Sebastian Trimpe

Abstract: Robotic systems are increasingly employed for industrial automation, with contact-rich tasks like polishing requiring dexterity and compliant behaviour. These tasks are difficult to model, making classical control challenging. Deep reinforcement learning (RL) offers a promising solution by enabling the learning of models and control policies directly from data. However, its application to real-wor… ▽ More Robotic systems are increasingly employed for industrial automation, with contact-rich tasks like polishing requiring dexterity and compliant behaviour. These tasks are difficult to model, making classical control challenging. Deep reinforcement learning (RL) offers a promising solution by enabling the learning of models and control policies directly from data. However, its application to real-world problems is limited by data inefficiency and unsafe exploration. Adaptive hybrid RL methods blend classical control and RL adaptively, combining the strengths of both: structure from control and learning from RL. This has led to improvements in data efficiency and exploration safety. However, their potential for hardware applications remains underexplored, with no evaluations on physical systems to date. Such evaluations are critical to fully assess the practicality and effectiveness of these methods in real-world settings. This work presents an experimental demonstration of the hybrid RL algorithm CHEQ for robotic polishing with variable impedance, a task requiring precise force and velocity tracking. In simulation, we show that variable impedance enhances polishing performance. We compare standalone RL with adaptive hybrid RL, demonstrating that CHEQ achieves effective learning while adhering to safety constraints. On hardware, CHEQ achieves effective polishing behaviour, requiring only eight hours of training and incurring just five failures. These results highlight the potential of adaptive hybrid RL for real-world, contact-rich tasks trained directly on hardware. △ Less

Submitted 2 June, 2025; v1 submitted 14 January, 2025; originally announced January 2025.

arXiv:2412.09477 [pdf, other]

Bayesian Optimization via Continual Variational Last Layer Training

Authors: Paul Brunzema, Mikkel Jordahn, John Willes, Sebastian Trimpe, Jasper Snoek, James Harrison

Abstract: Gaussian Processes (GPs) are widely seen as the state-of-the-art surrogate models for Bayesian optimization (BO) due to their ability to model uncertainty and their performance on tasks where correlations are easily captured (such as those defined by Euclidean metrics) and their ability to be efficiently updated online. However, the performance of GPs depends on the choice of kernel, and kernel se… ▽ More Gaussian Processes (GPs) are widely seen as the state-of-the-art surrogate models for Bayesian optimization (BO) due to their ability to model uncertainty and their performance on tasks where correlations are easily captured (such as those defined by Euclidean metrics) and their ability to be efficiently updated online. However, the performance of GPs depends on the choice of kernel, and kernel selection for complex correlation structures is often difficult or must be made bespoke. While Bayesian neural networks (BNNs) are a promising direction for higher capacity surrogate models, they have so far seen limited use due to poor performance on some problem types. In this paper, we propose an approach which shows competitive performance on many problem types, including some that BNNs typically struggle with. We build on variational Bayesian last layers (VBLLs), and connect training of these models to exact conditioning in GPs. We exploit this connection to develop an efficient online training algorithm that interleaves conditioning and optimization. Our findings suggest that VBLL networks significantly outperform GPs and other BNN architectures on tasks with complex input correlations, and match the performance of well-tuned GPs on established benchmark tasks. △ Less

Submitted 12 December, 2024; originally announced December 2024.

arXiv:2412.00395 [pdf, other]

On Foundation Models for Dynamical Systems from Purely Synthetic Data

Authors: Martin Ziegler, Andres Felipe Posada-Moreno, Friedrich Solowjow, Sebastian Trimpe

Abstract: Foundation models have demonstrated remarkable generalization, data efficiency, and robustness properties across various domains. In this paper, we explore the feasibility of foundation models for applications in the control domain. The success of these models is enabled by large-scale pretaining on Internet-scale datasets. These are available in fields like natural language processing and compute… ▽ More Foundation models have demonstrated remarkable generalization, data efficiency, and robustness properties across various domains. In this paper, we explore the feasibility of foundation models for applications in the control domain. The success of these models is enabled by large-scale pretaining on Internet-scale datasets. These are available in fields like natural language processing and computer vision, but do not exist for dynamical systems. We address this challenge by pretraining a transformer-based foundation model exclusively on synthetic data and propose to sample dynamics functions from a reproducing kernel Hilbert space. Our pretrained model generalizes for prediction tasks across different dynamical systems, which we validate in simulation and hardware experiments, including cart-pole and Furuta pendulum setups. Additionally, the model can be fine-tuned effectively to new systems to increase performance even further. Our results demonstrate the feasibility of foundation models for dynamical systems that outperform specialist models in terms of generalization, data efficiency, and robustness. △ Less

Submitted 17 December, 2024; v1 submitted 30 November, 2024; originally announced December 2024.

Comments: 10 pages

arXiv:2411.16267 [pdf, other]

doi 10.1515/auto-2023-0181

Local Bayesian Optimization for Controller Tuning with Crash Constraints

Authors: Alexander von Rohr, David Stenger, Dominik Scheurenberg, Sebastian Trimpe

Abstract: Controller tuning is crucial for closed-loop performance but often involves manual adjustments. Although Bayesian optimization (BO) has been established as a data-efficient method for automated tuning, applying it to large and high-dimensional search spaces remains challenging. We extend a recently proposed local variant of BO to include crash constraints, where the controller can only be successf… ▽ More Controller tuning is crucial for closed-loop performance but often involves manual adjustments. Although Bayesian optimization (BO) has been established as a data-efficient method for automated tuning, applying it to large and high-dimensional search spaces remains challenging. We extend a recently proposed local variant of BO to include crash constraints, where the controller can only be successfully evaluated in an a-priori unknown feasible region. We demonstrate the efficiency of the proposed method through simulations and hardware experiments. Our findings showcase the potential of local BO to enhance controller performance and reduce the time and resources necessary for tuning. △ Less

Submitted 25 November, 2024; originally announced November 2024.

Comments: Published in at-Automatisierungstechnik

Journal ref: von Rohr, Alexander, Stenger, David, Scheurenberg, Dominik and Trimpe, Sebastian. "Local Bayesian optimization for controller tuning with crash constraints" at - Automatisierungstechnik, vol. 72, no. 4, 2024, pp. 281-292

arXiv:2411.14246 [pdf, other]

doi 10.1109/TRO.2025.3539192

Simulation-Aided Policy Tuning for Black-Box Robot Learning

Authors: Shiming He, Alexander von Rohr, Dominik Baumann, Ji Xiang, Sebastian Trimpe

Abstract: How can robots learn and adapt to new tasks and situations with little data? Systematic exploration and simulation are crucial tools for efficient robot learning. We present a novel black-box policy search algorithm focused on data-efficient policy improvements. The algorithm learns directly on the robot and treats simulation as an additional information source to speed up the learning process. At… ▽ More How can robots learn and adapt to new tasks and situations with little data? Systematic exploration and simulation are crucial tools for efficient robot learning. We present a novel black-box policy search algorithm focused on data-efficient policy improvements. The algorithm learns directly on the robot and treats simulation as an additional information source to speed up the learning process. At the core of the algorithm, a probabilistic model learns the dependence of the policy parameters and the robot learning objective not only by performing experiments on the robot, but also by leveraging data from a simulator. This substantially reduces interaction time with the robot. Using this model, we can guarantee improvements with high probability for each policy update, thereby facilitating fast, goal-oriented learning. We evaluate our algorithm on simulated fine-tuning tasks and demonstrate the data-efficiency of the proposed dual-information source optimization algorithm. In a real robot learning experiment, we show fast and successful task learning on a robot manipulator with the aid of an imperfect simulator. △ Less

Submitted 21 November, 2024; originally announced November 2024.

arXiv:2409.16875 [pdf, other]

Feedforward Controllers from Learned Dynamic Local Model Networks with Application to Excavator Assistance Functions

Authors: Leon Greiser, Ozan Demir, Benjamin Hartmann, Henrik Hose, Sebastian Trimpe

Abstract: Complicated first principles modelling and controller synthesis can be prohibitively slow and expensive for high-mix, low-volume products such as hydraulic excavators. Instead, in a data-driven approach, recorded trajectories from the real system can be used to train local model networks (LMNs), for which feedforward controllers are derived via feedback linearization. However, previous works requi… ▽ More Complicated first principles modelling and controller synthesis can be prohibitively slow and expensive for high-mix, low-volume products such as hydraulic excavators. Instead, in a data-driven approach, recorded trajectories from the real system can be used to train local model networks (LMNs), for which feedforward controllers are derived via feedback linearization. However, previous works required LMNs without zero dynamics for feedback linearization, which restricts the model structure and thus modelling capacity of LMNs. In this paper, we overcome this restriction by providing a criterion for when feedback linearization of LMNs with zero dynamics yields a valid controller. As a criterion we propose the bounded-input bounded-output stability of the resulting controller. In two additional contributions, we extend this approach to consider measured disturbance signals and multiple inputs and outputs. We illustrate the effectiveness of our contributions in a hydraulic excavator control application with hardware experiments. To this end, we train LMNs from recorded, noisy data and derive feedforward controllers used as part of a leveling assistance system on the excavator. In our experiments, incorporating disturbance signals and multiple inputs and outputs enhances tracking performance of the learned controller. A video of our experiments is available at https://youtu.be/lrrWBx2ASaE. △ Less

Submitted 25 September, 2024; originally announced September 2024.

arXiv:2407.03476 [pdf, other]

Learning deformable linear object dynamics from a single trajectory

Authors: Shamil Mamedov, A. René Geist, Ruan Viljoen, Sebastian Trimpe, Jan Swevers

Abstract: The manipulation of deformable linear objects (DLOs) via model-based control requires an accurate and computationally efficient dynamics model. Yet, data-driven DLO dynamics models require large training data sets while their predictions often do not generalize, whereas physics-based models rely on good approximations of physical phenomena and often lack accuracy. To address these challenges, we p… ▽ More The manipulation of deformable linear objects (DLOs) via model-based control requires an accurate and computationally efficient dynamics model. Yet, data-driven DLO dynamics models require large training data sets while their predictions often do not generalize, whereas physics-based models rely on good approximations of physical phenomena and often lack accuracy. To address these challenges, we propose a physics-informed neural ODE capable of predicting agile movements with significantly less data and hyper-parameter tuning. In particular, we model DLOs as serial chains of rigid bodies interconnected by passive elastic joints in which interaction forces are predicted by neural networks. The proposed model accurately predicts the motion of an robotically-actuated aluminium rod and an elastic foam cylinder after being trained on only thirty seconds of data. The project code and data are available at: \url{https://tinyurl.com/neuralprba} △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2406.19768 [pdf, other]

Contextualized Hybrid Ensemble Q-learning: Learning Fast with Control Priors

Authors: Emma Cramer, Bernd Frauenknecht, Ramil Sabirov, Sebastian Trimpe

Abstract: Combining Reinforcement Learning (RL) with a prior controller can yield the best out of two worlds: RL can solve complex nonlinear problems, while the control prior ensures safer exploration and speeds up training. Prior work largely blends both components with a fixed weight, neglecting that the RL agent's performance varies with the training progress and across regions in the state space. Theref… ▽ More Combining Reinforcement Learning (RL) with a prior controller can yield the best out of two worlds: RL can solve complex nonlinear problems, while the control prior ensures safer exploration and speeds up training. Prior work largely blends both components with a fixed weight, neglecting that the RL agent's performance varies with the training progress and across regions in the state space. Therefore, we advocate for an adaptive strategy that dynamically adjusts the weighting based on the RL agent's current capabilities. We propose a new adaptive hybrid RL algorithm, Contextualized Hybrid Ensemble Q-learning (CHEQ). CHEQ combines three key ingredients: (i) a time-invariant formulation of the adaptive hybrid RL problem treating the adaptive weight as a context variable, (ii) a weight adaption mechanism based on the parametric uncertainty of a critic ensemble, and (iii) ensemble-based acceleration for data-efficient RL. Evaluating CHEQ on a car racing task reveals substantially stronger data efficiency, exploration safety, and transferability to unknown scenarios than state-of-the-art adaptive hybrid RL methods. △ Less

Submitted 1 July, 2024; v1 submitted 28 June, 2024; originally announced June 2024.

arXiv:2406.18293 [pdf, other]

Combining Automated Optimisation of Hyperparameters and Reward Shape

Authors: Julian Dierkes, Emma Cramer, Holger H. Hoos, Sebastian Trimpe

Abstract: There has been significant progress in deep reinforcement learning (RL) in recent years. Nevertheless, finding suitable hyperparameter configurations and reward functions remains challenging even for experts, and performance heavily relies on these design choices. Also, most RL research is conducted on known benchmarks where knowledge about these choices already exists. However, novel practical ap… ▽ More There has been significant progress in deep reinforcement learning (RL) in recent years. Nevertheless, finding suitable hyperparameter configurations and reward functions remains challenging even for experts, and performance heavily relies on these design choices. Also, most RL research is conducted on known benchmarks where knowledge about these choices already exists. However, novel practical applications often pose complex tasks for which no prior knowledge about good hyperparameters and reward functions is available, thus necessitating their derivation from scratch. Prior work has examined automatically tuning either hyperparameters or reward functions individually. We demonstrate empirically that an RL algorithm's hyperparameter configurations and reward function are often mutually dependent, meaning neither can be fully optimised without appropriate values for the other. We then propose a methodology for the combined optimisation of hyperparameters and the reward function. Furthermore, we include a variance penalty as an optimisation objective to improve the stability of learned policies. We conducted extensive experiments using Proximal Policy Optimisation and Soft Actor-Critic on four environments. Our results show that combined optimisation significantly improves over baseline performance in half of the environments and achieves competitive performance in the others, with only a minor increase in computational costs. This suggests that combined optimisation should be best practice. △ Less

Submitted 9 October, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

Comments: Published in the Reinforcement Learning Journal 2024

arXiv:2406.06101 [pdf, ps, other]

On the Consistency of Kernel Methods with Dependent Observations

Authors: Pierre-François Massiani, Sebastian Trimpe, Friedrich Solowjow

Abstract: The consistency of a learning method is usually established under the assumption that the observations are a realization of an independent and identically distributed (i.i.d.) or mixing process. Yet, kernel methods such as support vector machines (SVMs), Gaussian processes, or conditional kernel mean embeddings (CKMEs) all give excellent performance under sampling schemes that are obviously non-i.… ▽ More The consistency of a learning method is usually established under the assumption that the observations are a realization of an independent and identically distributed (i.i.d.) or mixing process. Yet, kernel methods such as support vector machines (SVMs), Gaussian processes, or conditional kernel mean embeddings (CKMEs) all give excellent performance under sampling schemes that are obviously non-i.i.d., such as when data comes from a dynamical system. We propose the new notion of empirical weak convergence (EWC) as a general assumption explaining such phenomena for kernel methods. It assumes the existence of a random asymptotic data distribution and is a strict weakening of previous assumptions in the field. Our main results then establish consistency of SVMs, kernel mean embeddings, and general Hilbert-space valued empirical expectations with EWC data. Our analysis holds for both finite- and infinite-dimensional outputs, as we extend classical results of statistical learning to the latter case. In particular, it is also applicable to CKMEs. Overall, our results open new classes of processes to statistical learning and can serve as a foundation for a theory of learning beyond i.i.d. and mixing. △ Less

Submitted 10 June, 2024; originally announced June 2024.

Comments: 26 pages, 1 figure

arXiv:2405.19014 [pdf, other]

Trust the Model Where It Trusts Itself -- Model-Based Actor-Critic with Uncertainty-Aware Rollout Adaption

Authors: Bernd Frauenknecht, Artur Eisele, Devdutt Subhasish, Friedrich Solowjow, Sebastian Trimpe

Abstract: Dyna-style model-based reinforcement learning (MBRL) combines model-free agents with predictive transition models through model-based rollouts. This combination raises a critical question: 'When to trust your model?'; i.e., which rollout length results in the model providing useful data? Janner et al. (2019) address this question by gradually increasing rollout lengths throughout the training. Whi… ▽ More Dyna-style model-based reinforcement learning (MBRL) combines model-free agents with predictive transition models through model-based rollouts. This combination raises a critical question: 'When to trust your model?'; i.e., which rollout length results in the model providing useful data? Janner et al. (2019) address this question by gradually increasing rollout lengths throughout the training. While theoretically tempting, uniform model accuracy is a fallacy that collapses at the latest when extrapolating. Instead, we propose asking the question 'Where to trust your model?'. Using inherent model uncertainty to consider local accuracy, we obtain the Model-Based Actor-Critic with Uncertainty-Aware Rollout Adaption (MACURA) algorithm. We propose an easy-to-tune rollout mechanism and demonstrate substantial improvements in data efficiency and performance compared to state-of-the-art deep MBRL methods on the MuJoCo benchmark. △ Less

Submitted 21 June, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

arXiv:2405.10618 [pdf, other]

Distributed Event-Based Learning via ADMM

Authors: Guner Dilsad Er, Sebastian Trimpe, Michael Muehlebach

Abstract: We consider a distributed learning problem, where agents minimize a global objective function by exchanging information over a network. Our approach has two distinct features: (i) It substantially reduces communication by triggering communication only when necessary, and (ii) it is agnostic to the data-distribution among the different agents. We therefore guarantee convergence even if the local da… ▽ More We consider a distributed learning problem, where agents minimize a global objective function by exchanging information over a network. Our approach has two distinct features: (i) It substantially reduces communication by triggering communication only when necessary, and (ii) it is agnostic to the data-distribution among the different agents. We therefore guarantee convergence even if the local data-distributions of the agents are arbitrarily distinct. We analyze the convergence rate of the algorithm both in convex and nonconvex settings and derive accelerated convergence rates for the convex case. We also characterize the effect of communication failures and demonstrate that our algorithm is robust to these. The article concludes by presenting numerical results from distributed learning tasks on the MNIST and CIFAR-10 datasets. The experiments underline communication savings of 35% or more due to the event-based communication strategy, show resilience towards heterogeneous data-distributions, and highlight that our approach outperforms common baselines such as FedAvg, FedProx, SCAFFOLD and FedADMM. △ Less

Submitted 6 February, 2025; v1 submitted 17 May, 2024; originally announced May 2024.

Comments: 35 pages, 12 figures

arXiv:2404.05835 [pdf, other]

Parameter-Adaptive Approximate MPC: Tuning Neural-Network Controllers without Retraining

Authors: Henrik Hose, Alexander Gräfe, Sebastian Trimpe

Abstract: Model Predictive Control (MPC) is a method to control nonlinear systems with guaranteed stability and constraint satisfaction but suffers from high computation times. Approximate MPC (AMPC) with neural networks (NNs) has emerged to address this limitation, enabling deployment on resource-constrained embedded systems. However, when tuning AMPCs for real-world systems, large datasets need to be rege… ▽ More Model Predictive Control (MPC) is a method to control nonlinear systems with guaranteed stability and constraint satisfaction but suffers from high computation times. Approximate MPC (AMPC) with neural networks (NNs) has emerged to address this limitation, enabling deployment on resource-constrained embedded systems. However, when tuning AMPCs for real-world systems, large datasets need to be regenerated and the NN needs to be retrained at every tuning step. This work introduces a novel, parameter-adaptive AMPC architecture capable of online tuning without recomputing large datasets and retraining. By incorporating local sensitivities of nonlinear programs, the proposed method not only mimics optimal MPC inputs but also adjusts to known changes in physical parameters of the model using linear predictions while still guaranteeing stability. We showcase the effectiveness of parameter-adaptive AMPC by controlling the swing-ups of two different real cartpole systems with a severely resource-constrained microcontroller (MCU). We use the same NN across both system instances that have different parameters. This work not only represents the first experimental demonstration of AMPC for fast-moving systems on low-cost MCUs to the best of our knowledge, but also showcases generalization across system instances and variations through our parameter-adaptation method. Taken together, these contributions represent a marked step toward the practical application of AMPC in real-world systems. △ Less

Submitted 6 June, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

Comments: Accepted to L4DC 2024

Journal ref: PMLR 242:349-360, 2024

arXiv:2403.12948 [pdf, other]

On Safety in Safe Bayesian Optimization

Authors: Christian Fiedler, Johanna Menn, Lukas Kreisköther, Sebastian Trimpe

Abstract: Optimizing an unknown function under safety constraints is a central task in robotics, biomedical engineering, and many other disciplines, and increasingly safe Bayesian Optimization (BO) is used for this. Due to the safety critical nature of these applications, it is of utmost importance that theoretical safety guarantees for these algorithms translate into the real world. In this work, we invest… ▽ More Optimizing an unknown function under safety constraints is a central task in robotics, biomedical engineering, and many other disciplines, and increasingly safe Bayesian Optimization (BO) is used for this. Due to the safety critical nature of these applications, it is of utmost importance that theoretical safety guarantees for these algorithms translate into the real world. In this work, we investigate three safety-related issues of the popular class of SafeOpt-type algorithms. First, these algorithms critically rely on frequentist uncertainty bounds for Gaussian Process (GP) regression, but concrete implementations typically utilize heuristics that invalidate all safety guarantees. We provide a detailed analysis of this problem and introduce Real-\b{eta}-SafeOpt, a variant of the SafeOpt algorithm that leverages recent GP bounds and thus retains all theoretical guarantees. Second, we identify assuming an upper bound on the reproducing kernel Hilbert space (RKHS) norm of the target function, a key technical assumption in SafeOpt-like algorithms, as a central obstacle to real-world usage. To overcome this challenge, we introduce the Lipschitz-only Safe Bayesian Optimization (LoSBO) algorithm, which guarantees safety without an assumption on the RKHS bound, and empirically show that this algorithm is not only safe, but also exhibits superior performance compared to the state-of-the-art on several function classes. Third, SafeOpt and derived algorithms rely on a discrete search space, making them difficult to apply to higher-dimensional problems. To widen the applicability of these algorithms, we introduce Lipschitz-only GP-UCB (LoS-GP-UCB), a variant of LoSBO applicable to moderately high-dimensional problems, while retaining safety. △ Less

Submitted 19 March, 2024; originally announced March 2024.

arXiv:2312.10199 [pdf, other]

Automatic nonlinear MPC approximation with closed-loop guarantees

Authors: Abdullah Tokmak, Christian Fiedler, Melanie N. Zeilinger, Sebastian Trimpe, Johannes Köhler

Abstract: Safety guarantees are vital in many control applications, such as robotics. Model predictive control (MPC) provides a constructive framework for controlling safety-critical systems, but is limited by its computational complexity. We address this problem by presenting a novel algorithm that automatically computes an explicit approximation to nonlinear MPC schemes while retaining closed-loop guarant… ▽ More Safety guarantees are vital in many control applications, such as robotics. Model predictive control (MPC) provides a constructive framework for controlling safety-critical systems, but is limited by its computational complexity. We address this problem by presenting a novel algorithm that automatically computes an explicit approximation to nonlinear MPC schemes while retaining closed-loop guarantees. Specifically, the problem can be reduced to a function approximation problem, which we then tackle by proposing ALKIA-X, the Adaptive and Localized Kernel Interpolation Algorithm with eXtrapolated reproducing kernel Hilbert space norm. ALKIA-X is a non-iterative algorithm that ensures numerically well-conditioned computations, a fast-to-evaluate approximating function, and the guaranteed satisfaction of any desired bound on the approximation error. Hence, ALKIA-X automatically computes an explicit function that approximates the MPC, yielding a controller suitable for safety-critical systems and high sampling rates. We apply ALKIA-X to approximate two nonlinear MPC schemes, demonstrating reduced computational demand and applicability to realistic problems. △ Less

Submitted 11 April, 2024; v1 submitted 15 December, 2023; originally announced December 2023.

Comments: Submitted to IEEE Transactions on Automatic Control. Compared to the previously uploaded version, this version contains an additional numerical example

arXiv:2312.00592 [pdf, other]

Tracking Object Positions in Reinforcement Learning: A Metric for Keypoint Detection (extended version)

Authors: Emma Cramer, Jonas Reiher, Sebastian Trimpe

Abstract: Reinforcement learning (RL) for robot control typically requires a detailed representation of the environment state, including information about task-relevant objects not directly measurable. Keypoint detectors, such as spatial autoencoders (SAEs), are a common approach to extracting a low-dimensional representation from high-dimensional image data. SAEs aim at spatial features such as object posi… ▽ More Reinforcement learning (RL) for robot control typically requires a detailed representation of the environment state, including information about task-relevant objects not directly measurable. Keypoint detectors, such as spatial autoencoders (SAEs), are a common approach to extracting a low-dimensional representation from high-dimensional image data. SAEs aim at spatial features such as object positions, which are often useful representations in robotic RL. However, whether an SAE is actually able to track objects in the scene and thus yields a spatial state representation well suited for RL tasks has rarely been examined due to a lack of established metrics. In this paper, we propose to assess the performance of an SAE instance by measuring how well keypoints track ground truth objects in images. We present a computationally lightweight metric and use it to evaluate common baseline SAE architectures on image data from a simulated robot task. We find that common SAEs differ substantially in their spatial extraction capability. Furthermore, we validate that SAEs that perform well in our metric achieve superior performance when used in downstream RL. Thus, our metric is an effective and lightweight indicator of RL performance before executing expensive RL training. Building on these insights, we identify three key modifications of SAE architectures to improve tracking performance. △ Less

Submitted 2 July, 2024; v1 submitted 1 December, 2023; originally announced December 2023.

Comments: 19 pages, 12 figures

arXiv:2311.18393 [pdf, other]

Data-efficient Deep Reinforcement Learning for Vehicle Trajectory Control

Authors: Bernd Frauenknecht, Tobias Ehlgen, Sebastian Trimpe

Abstract: Advanced vehicle control is a fundamental building block in the development of autonomous driving systems. Reinforcement learning (RL) promises to achieve control performance superior to classical approaches while keeping computational demands low during deployment. However, standard RL approaches like soft-actor critic (SAC) require extensive amounts of training data to be collected and are thus… ▽ More Advanced vehicle control is a fundamental building block in the development of autonomous driving systems. Reinforcement learning (RL) promises to achieve control performance superior to classical approaches while keeping computational demands low during deployment. However, standard RL approaches like soft-actor critic (SAC) require extensive amounts of training data to be collected and are thus impractical for real-world application. To address this issue, we apply recently developed data-efficient deep RL methods to vehicle trajectory control. Our investigation focuses on three methods, so far unexplored for vehicle control: randomized ensemble double Q-learning (REDQ), probabilistic ensembles with trajectory sampling and model predictive path integral optimizer (PETS-MPPI), and model-based policy optimization (MBPO). We find that in the case of trajectory control, the standard model-based RL formulation used in approaches like PETS-MPPI and MBPO is not suitable. We, therefore, propose a new formulation that splits dynamics prediction and vehicle localization. Our benchmark study on the CARLA simulator reveals that the three identified data-efficient deep RL approaches learn control strategies on a par with or better than SAC, yet reduce the required number of environment interactions by more than one order of magnitude. △ Less

Submitted 30 November, 2023; originally announced November 2023.

arXiv:2310.18074 [pdf, ps, other]

On kernel-based statistical learning in the mean field limit

Authors: Christian Fiedler, Michael Herty, Sebastian Trimpe

Abstract: In many applications of machine learning, a large number of variables are considered. Motivated by machine learning of interacting particle systems, we consider the situation when the number of input variables goes to infinity. First, we continue the recent investigation of the mean field limit of kernels and their reproducing kernel Hilbert spaces, completing the existing theory. Next, we provide… ▽ More In many applications of machine learning, a large number of variables are considered. Motivated by machine learning of interacting particle systems, we consider the situation when the number of input variables goes to infinity. First, we continue the recent investigation of the mean field limit of kernels and their reproducing kernel Hilbert spaces, completing the existing theory. Next, we provide results relevant for approximation with such kernels in the mean field limit, including a representer theorem. Finally, we use these kernels in the context of statistical learning in the mean field limit, focusing on Support Vector Machines. In particular, we show mean field convergence of empirical and infinite-sample solutions as well as the convergence of the corresponding risks. On the one hand, our results establish rigorous mean field limits in the context of kernel methods, providing new theoretical tools and insights for large-scale problems. On the other hand, our setting corresponds to a new form of limit of learning problems, which seems to have not been investigated yet in the statistical learning theory literature. △ Less

Submitted 27 October, 2023; originally announced October 2023.

Comments: NeurIPS 2023

arXiv:2309.02873 [pdf, other]

Learning Hybrid Dynamics Models With Simulator-Informed Latent States

Authors: Katharina Ensinger, Sebastian Ziesche, Sebastian Trimpe

Abstract: Dynamics model learning deals with the task of inferring unknown dynamics from measurement data and predicting the future behavior of the system. A typical approach to address this problem is to train recurrent models. However, predictions with these models are often not physically meaningful. Further, they suffer from deteriorated behavior over time due to accumulating errors. Often, simulators b… ▽ More Dynamics model learning deals with the task of inferring unknown dynamics from measurement data and predicting the future behavior of the system. A typical approach to address this problem is to train recurrent models. However, predictions with these models are often not physically meaningful. Further, they suffer from deteriorated behavior over time due to accumulating errors. Often, simulators building on first principles are available being physically meaningful by design. However, modeling simplifications typically cause inaccuracies in these models. Consequently, hybrid modeling is an emerging trend that aims to combine the best of both worlds. In this paper, we propose a new approach to hybrid modeling, where we inform the latent states of a learned model via a black-box simulator. This allows to control the predictions via the simulator preventing them from accumulating errors. This is especially challenging since, in contrast to previous approaches, access to the simulator's latent states is not available. We tackle the task by leveraging observers, a well-known concept from control theory, inferring unknown latent states from observations and dynamics over time. In our learning-based setting, we jointly learn the dynamics and an observer that infers the latent states via the simulator. Thus, the simulator constantly corrects the latent states, compensating for modeling mismatch caused by learning. To maintain flexibility, we train an RNN-based residuum for the latent states that cannot be informed by the simulator. △ Less

Submitted 29 January, 2024; v1 submitted 6 September, 2023; originally announced September 2023.

Comments: Accepted at The 38th Annual AAAI Conference on Artificial Intelligence, 2024

arXiv:2309.02351 [pdf, other]

Exact Inference for Continuous-Time Gaussian Process Dynamics

Authors: Katharina Ensinger, Nicholas Tagliapietra, Sebastian Ziesche, Sebastian Trimpe

Abstract: Physical systems can often be described via a continuous-time dynamical system. In practice, the true system is often unknown and has to be learned from measurement data. Since data is typically collected in discrete time, e.g. by sensors, most methods in Gaussian process (GP) dynamics model learning are trained on one-step ahead predictions. This can become problematic in several scenarios, e.g.… ▽ More Physical systems can often be described via a continuous-time dynamical system. In practice, the true system is often unknown and has to be learned from measurement data. Since data is typically collected in discrete time, e.g. by sensors, most methods in Gaussian process (GP) dynamics model learning are trained on one-step ahead predictions. This can become problematic in several scenarios, e.g. if measurements are provided at irregularly-sampled time steps or physical system properties have to be conserved. Thus, we aim for a GP model of the true continuous-time dynamics. Higher-order numerical integrators provide the necessary tools to address this problem by discretizing the dynamics function with arbitrary accuracy. Many higher-order integrators require dynamics evaluations at intermediate time steps making exact GP inference intractable. In previous work, this problem is often tackled by approximating the GP posterior with variational inference. However, exact GP inference is preferable in many scenarios, e.g. due to its mathematical guarantees. In order to make direct inference tractable, we propose to leverage multistep and Taylor integrators. We demonstrate how to derive flexible inference schemes for these types of integrators. Further, we derive tailored sampling schemes that allow to draw consistent dynamics functions from the learned posterior. This is crucial to sample consistent predictions from the dynamics model. We demonstrate empirically and theoretically that our approach yields an accurate representation of the continuous-time system. △ Less

Submitted 29 January, 2024; v1 submitted 5 September, 2023; originally announced September 2023.

Comments: Accepted at The 38th Annual AAAI Conference on Artificial Intelligence. 2024

arXiv:2308.06022 [pdf, other]

Scale-Preserving Automatic Concept Extraction (SPACE)

Authors: Andrés Felipe Posada-Moreno, Lukas Kreisköther, Tassilo Glander, Sebastian Trimpe

Abstract: Convolutional Neural Networks (CNN) have become a common choice for industrial quality control, as well as other critical applications in the Industry 4.0. When these CNNs behave in ways unexpected to human users or developers, severe consequences can arise, such as economic losses or an increased risk to human life. Concept extraction techniques can be applied to increase the reliability and tran… ▽ More Convolutional Neural Networks (CNN) have become a common choice for industrial quality control, as well as other critical applications in the Industry 4.0. When these CNNs behave in ways unexpected to human users or developers, severe consequences can arise, such as economic losses or an increased risk to human life. Concept extraction techniques can be applied to increase the reliability and transparency of CNNs through generating global explanations for trained neural network models. The decisive features of image datasets in quality control often depend on the feature's scale; for example, the size of a hole or an edge. However, existing concept extraction methods do not correctly represent scale, which leads to problems interpreting these models as we show herein. To address this issue, we introduce the Scale-Preserving Automatic Concept Extraction (SPACE) algorithm, as a state-of-the-art alternative concept extraction technique for CNNs, focused on industrial applications. SPACE is specifically designed to overcome the aforementioned problems by avoiding scale changes throughout the concept extraction process. SPACE proposes an approach based on square slices of input images, which are selected and then tiled before being clustered into concepts. Our method provides explanations of the models' decision-making process in the form of human-understandable concepts. We evaluate SPACE on three image classification datasets in the context of industrial quality control. Through experimental results, we illustrate how SPACE outperforms other methods and provides actionable insights on the decision mechanisms of CNNs. Finally, code for the implementation of SPACE is provided. △ Less

Submitted 11 August, 2023; originally announced August 2023.

Comments: 22 pages, 7 figures

arXiv:2307.07975 [pdf, other]

Pseudo-rigid body networks: learning interpretable deformable object dynamics from partial observations

Authors: Shamil Mamedov, A. René Geist, Jan Swevers, Sebastian Trimpe

Abstract: Accurately predicting deformable linear object (DLO) dynamics is challenging, especially when the task requires a model that is both human-interpretable and computationally efficient. In this work, we draw inspiration from the pseudo-rigid body method (PRB) and model a DLO as a serial chain of rigid bodies whose internal state is unrolled through time by a dynamics network. This dynamics network i… ▽ More Accurately predicting deformable linear object (DLO) dynamics is challenging, especially when the task requires a model that is both human-interpretable and computationally efficient. In this work, we draw inspiration from the pseudo-rigid body method (PRB) and model a DLO as a serial chain of rigid bodies whose internal state is unrolled through time by a dynamics network. This dynamics network is trained jointly with a physics-informed encoder that maps observed motion variables to the DLO's hidden state. To encourage the state to acquire a physically meaningful representation, we leverage the forward kinematics of the PRB model as a decoder. We demonstrate in robot experiments that the proposed DLO dynamics model provides physically interpretable predictions from partial observations while being on par with black-box models regarding prediction accuracy. The project code is available at: http://tinyurl.com/prb-networks △ Less

Submitted 10 September, 2024; v1 submitted 16 July, 2023; originally announced July 2023.

Comments: This work has been submitted to the IEEE for possible publication

arXiv:2306.16973 [pdf, other]

Robust Direct Data-Driven Control for Probabilistic Systems

Authors: Alexander von Rohr, Dmitrii Likhachev, Sebastian Trimpe

Abstract: We propose a data-driven control method for systems with aleatoric uncertainty, for example, robot fleets with variations between agents. Our method leverages shared trajectory data to increase the robustness of the designed controller and thus facilitate transfer to new variations without the need for prior parameter and uncertainty estimations. In contrast to existing work on experience transfer… ▽ More We propose a data-driven control method for systems with aleatoric uncertainty, for example, robot fleets with variations between agents. Our method leverages shared trajectory data to increase the robustness of the designed controller and thus facilitate transfer to new variations without the need for prior parameter and uncertainty estimations. In contrast to existing work on experience transfer for performance, our approach focuses on robustness and uses data collected from multiple realizations to guarantee generalization to unseen ones. Our method is based on scenario optimization combined with recent formulations for direct data-driven control. We derive lower bounds on the amount of data required to achieve quadratic stability for probabilistic systems with aleatoric uncertainty and demonstrate the benefits of our data-driven method through a numerical example. We find that the learned controllers generalize well to high variations in the dynamics even when based on only a few short open-loop trajectories. Robust experience transfer enables the design of safe and robust controllers that work out of the box without any additional learning during deployment. △ Less

Submitted 22 March, 2024; v1 submitted 29 June, 2023; originally announced June 2023.

arXiv:2306.03551 [pdf, other]

Scalable Concept Extraction in Industry 4.0

Authors: Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, Sebastian Trimpe

Abstract: The industry 4.0 is leveraging digital technologies and machine learning techniques to connect and optimize manufacturing processes. Central to this idea is the ability to transform raw data into human understandable knowledge for reliable data-driven decision-making. Convolutional Neural Networks (CNNs) have been instrumental in processing image data, yet, their ``black box'' nature complicates t… ▽ More The industry 4.0 is leveraging digital technologies and machine learning techniques to connect and optimize manufacturing processes. Central to this idea is the ability to transform raw data into human understandable knowledge for reliable data-driven decision-making. Convolutional Neural Networks (CNNs) have been instrumental in processing image data, yet, their ``black box'' nature complicates the understanding of their prediction process. In this context, recent advances in the field of eXplainable Artificial Intelligence (XAI) have proposed the extraction and localization of concepts, or which visual cues intervene on the prediction process of CNNs. This paper tackles the application of concept extraction (CE) methods to industry 4.0 scenarios. To this end, we modify a recently developed technique, ``Extracting Concepts with Local Aggregated Descriptors'' (ECLAD), improving its scalability. Specifically, we propose a novel procedure for calculating concept importance, utilizing a wrapper function designed for CNNs. This process is aimed at decreasing the number of times each image needs to be evaluated. Subsequently, we demonstrate the potential of CE methods, by applying them in three industrial use cases. We selected three representative use cases in the context of quality control for material design (tailored textiles), manufacturing (carbon fiber reinforcement), and maintenance (photovoltaic module inspection). In these examples, CE was able to successfully extract and locate concepts directly related to each task. This is, the visual cues related to each concept, coincided with what human experts would use to perform the task themselves, even when the visual cues were entangled between multiple classes. Through empirical results, we show that CE can be applied for understanding CNNs in an industrial context, giving useful insights that can relate to domain knowledge. △ Less

Submitted 6 June, 2023; originally announced June 2023.

arXiv:2304.09575 [pdf, ps, other]

Approximate non-linear model predictive control with safety-augmented neural networks

Authors: Henrik Hose, Johannes Köhler, Melanie N. Zeilinger, Sebastian Trimpe

Abstract: Model predictive control (MPC) achieves stability and constraint satisfaction for general nonlinear systems, but requires computationally expensive online optimization. This paper studies approximations of such MPC controllers via neural networks (NNs) to achieve fast online evaluation. We propose safety augmentation that yields deterministic guarantees for convergence and constraint satisfaction… ▽ More Model predictive control (MPC) achieves stability and constraint satisfaction for general nonlinear systems, but requires computationally expensive online optimization. This paper studies approximations of such MPC controllers via neural networks (NNs) to achieve fast online evaluation. We propose safety augmentation that yields deterministic guarantees for convergence and constraint satisfaction despite approximation inaccuracies. We approximate the entire input sequence of the MPC with NNs, which allows us to verify online if it is a feasible solution to the MPC problem. We replace the NN solution by a safe candidate based on standard MPC techniques whenever it is infeasible or has worse cost. Our method requires a single evaluation of the NN and forward integration of the input sequence online, which is fast to compute on resource-constrained systems. The proposed control framework is illustrated using two numerical non-linear MPC benchmarks of different complexity, demonstrating computational speedups that are orders of magnitude higher than online optimization. In the examples, we achieve deterministic safety through the safety-augmented NNs, where a naive NN implementation fails. △ Less

Submitted 8 October, 2024; v1 submitted 19 April, 2023; originally announced April 2023.

arXiv:2303.04930 [pdf, other]

doi 10.1109/TASE.2023.3296569

Multimodal Multi-User Surface Recognition with the Kernel Two-Sample Test

Authors: Behnam Khojasteh, Friedrich Solowjow, Sebastian Trimpe, Katherine J. Kuchenbecker

Abstract: Machine learning and deep learning have been used extensively to classify physical surfaces through images and time-series contact data. However, these methods rely on human expertise and entail the time-consuming processes of data and parameter tuning. To overcome these challenges, we propose an easily implemented framework that can directly handle heterogeneous data sources for classification ta… ▽ More Machine learning and deep learning have been used extensively to classify physical surfaces through images and time-series contact data. However, these methods rely on human expertise and entail the time-consuming processes of data and parameter tuning. To overcome these challenges, we propose an easily implemented framework that can directly handle heterogeneous data sources for classification tasks. Our data-versus-data approach automatically quantifies distinctive differences in distributions in a high-dimensional space via kernel two-sample testing between two sets extracted from multimodal data (e.g., images, sounds, haptic signals). We demonstrate the effectiveness of our technique by benchmarking against expertly engineered classifiers for visual-audio-haptic surface recognition due to the industrial relevance, difficulty, and competitive baselines of this application; ablation studies confirm the utility of key components of our pipeline. As shown in our open-source code, we achieve 97.2% accuracy on a standard multi-user dataset with 108 surface classes, outperforming the state-of-the-art machine-learning algorithm by 6% on a more difficult version of the task. The fact that our classifier obtains this performance with minimal data processing in the standard algorithm setting reinforces the powerful nature of kernel methods for learning to recognize complex patterns. △ Less

Submitted 8 March, 2023; originally announced March 2023.

arXiv:2302.14446 [pdf, ps, other]

Reproducing kernel Hilbert spaces in the mean field limit

Authors: Christian Fiedler, Michael Herty, Michael Rom, Chiara Segala, Sebastian Trimpe

Abstract: Kernel methods, being supported by a well-developed theory and coming with efficient algorithms, are among the most popular and successful machine learning techniques. From a mathematical point of view, these methods rest on the concept of kernels and function spaces generated by kernels, so called reproducing kernel Hilbert spaces. Motivated by recent developments of learning approaches in the co… ▽ More Kernel methods, being supported by a well-developed theory and coming with efficient algorithms, are among the most popular and successful machine learning techniques. From a mathematical point of view, these methods rest on the concept of kernels and function spaces generated by kernels, so called reproducing kernel Hilbert spaces. Motivated by recent developments of learning approaches in the context of interacting particle systems, we investigate kernel methods acting on data with many measurement variables. We show the rigorous mean field limit of kernels and provide a detailed analysis of the limiting reproducing kernel Hilbert space. Furthermore, several examples of kernels, that allow a rigorous mean field limit, are presented. △ Less

Submitted 17 March, 2023; v1 submitted 28 February, 2023; originally announced February 2023.

Comments: Updated author email addresses

MSC Class: 46E22; 82B40; 74A25; 82C40

arXiv:2302.13754 [pdf, other]

Combining Slow and Fast: Complementary Filtering for Dynamics Learning

Authors: Katharina Ensinger, Sebastian Ziesche, Barbara Rakitsch, Michael Tiemann, Sebastian Trimpe

Abstract: Modeling an unknown dynamical system is crucial in order to predict the future behavior of the system. A standard approach is training recurrent models on measurement data. While these models typically provide exact short-term predictions, accumulating errors yield deteriorated long-term behavior. In contrast, models with reliable long-term predictions can often be obtained, either by training a r… ▽ More Modeling an unknown dynamical system is crucial in order to predict the future behavior of the system. A standard approach is training recurrent models on measurement data. While these models typically provide exact short-term predictions, accumulating errors yield deteriorated long-term behavior. In contrast, models with reliable long-term predictions can often be obtained, either by training a robust but less detailed model, or by leveraging physics-based simulations. In both cases, inaccuracies in the models yield a lack of short-time details. Thus, different models with contrastive properties on different time horizons are available. This observation immediately raises the question: Can we obtain predictions that combine the best of both worlds? Inspired by sensor fusion tasks, we interpret the problem in the frequency domain and leverage classical methods from signal processing, in particular complementary filters. This filtering technique combines two signals by applying a high-pass filter to one signal, and low-pass filtering the other. Essentially, the high-pass filter extracts high-frequencies, whereas the low-pass filter extracts low frequencies. Applying this concept to dynamics model learning enables the construction of models that yield accurate long- and short-term predictions. Here, we propose two methods, one being purely learning-based and the other one being a hybrid model that requires an additional physics-based simulator. △ Less

Submitted 1 March, 2023; v1 submitted 27 February, 2023; originally announced February 2023.

arXiv:2302.11979 [pdf, other]

doi 10.1109/TAC.2023.3346812

Data-Driven Observability Analysis for Nonlinear Stochastic Systems

Authors: Pierre-François Massiani, Mona Buisson-Fenet, Friedrich Solowjow, Florent Di Meglio, Sebastian Trimpe

Abstract: Distinguishability and, by extension, observability are key properties of dynamical systems. Establishing these properties is challenging, especially when no analytical model is available and they are to be inferred directly from measurement data. The presence of noise further complicates this analysis, as standard notions of distinguishability are tailored to deterministic systems. We build on di… ▽ More Distinguishability and, by extension, observability are key properties of dynamical systems. Establishing these properties is challenging, especially when no analytical model is available and they are to be inferred directly from measurement data. The presence of noise further complicates this analysis, as standard notions of distinguishability are tailored to deterministic systems. We build on distributional distinguishability, which extends the deterministic notion by comparing distributions of outputs of stochastic systems. We first show that both concepts are equivalent for a class of systems that includes linear systems. We then present a method to assess and quantify distributional distinguishability from output data. Specifically, our quantification measures how much data is required to tell apart two initial states, inducing a continuous spectrum of distinguishability. We propose a statistical test to determine a threshold above which two states can be considered distinguishable with high confidence. We illustrate these tools by computing distinguishability maps over the state space in simulation, then leverage the test to compare sensor configurations on hardware. △ Less

Submitted 7 June, 2024; v1 submitted 23 February, 2023; originally announced February 2023.

Comments: 9 pages, 3 figures

Journal ref: IEEE Transactions of Automatic Control 69 (2023) 4042 -- 4049

arXiv:2208.10790 [pdf, other]

Event-Triggered Time-Varying Bayesian Optimization

Authors: Paul Brunzema, Alexander von Rohr, Friedrich Solowjow, Sebastian Trimpe

Abstract: We consider the problem of sequentially optimizing a time-varying objective function using time-varying Bayesian optimization (TVBO). Current approaches to TVBO require prior knowledge of a constant rate of change to cope with stale data arising from time variations. However, in practice, the rate of change is usually unknown. We propose an event-triggered algorithm, ET-GP-UCB, that treats the opt… ▽ More We consider the problem of sequentially optimizing a time-varying objective function using time-varying Bayesian optimization (TVBO). Current approaches to TVBO require prior knowledge of a constant rate of change to cope with stale data arising from time variations. However, in practice, the rate of change is usually unknown. We propose an event-triggered algorithm, ET-GP-UCB, that treats the optimization problem as static until it detects changes in the objective function and then resets the dataset. This allows the algorithm to adapt online to realized temporal changes without the need for exact prior knowledge. The event trigger is based on probabilistic uniform error bounds used in Gaussian process regression. We derive regret bounds for adaptive resets without exact prior knowledge of the temporal changes and show in numerical experiments that ET-GP-UCB outperforms competing GP-UCB algorithms on both synthetic and real-world data. The results demonstrate that ET-GP-UCB is readily applicable without extensive hyperparameter tuning. △ Less

Submitted 4 February, 2025; v1 submitted 23 August, 2022; originally announced August 2022.

Comments: Published in Transactions on Machine Learning Research (TMLR)

arXiv:2208.02315 [pdf, other]

doi 10.1109/MRA.2021.3129448

Learning Fast and Precise Pixel-to-Torque Control

Authors: Steffen Bleher, Steve Heim, Sebastian Trimpe

Abstract: In the field, robots often need to operate in unknown and unstructured environments, where accurate sensing and state estimation (SE) becomes a major challenge. Cameras have been used to great success in mapping and planning in such environments, as well as complex but quasi-static tasks such as grasping, but are rarely integrated into the control loop for unstable systems. Learning pixel-to-torqu… ▽ More In the field, robots often need to operate in unknown and unstructured environments, where accurate sensing and state estimation (SE) becomes a major challenge. Cameras have been used to great success in mapping and planning in such environments, as well as complex but quasi-static tasks such as grasping, but are rarely integrated into the control loop for unstable systems. Learning pixel-to-torque control promises to allow robots to flexibly handle a wider variety of tasks. Although they do not present additional theoretical obstacles, learning pixel-to-torque control for unstable systems that that require precise and high bandwidth control still poses a significant practical challenge, and best practices have not yet been established. To help drive reproducible research on the practical aspects of learning pixel-to-torque control, we propose a platform that can flexibly represent the entire process, from lab to deployment, for learning pixel-to-torque control on a robot with fast, unstable dynamics: the vision-based Furuta pendulum. The platform can be reproduced with either off-the-shelf or custom-built hardware. We expect that this platform will allow researchers to quickly and systematically test different approaches, as well as reproduce and benchmark case studies from other labs. We also present a first case study on this system using DNNs which, to the best of our knowledge, is the first demonstration of learning pixel-to-torque control on an unstable system with update rates faster than 100 Hz. A video synopsis can be found online at https://youtu.be/S2llScfG-8E, and in the supplementary material. △ Less

Submitted 3 August, 2022; originally announced August 2022.

Comments: video: https://www.youtube.com/watch?v=S2llScfG-8E 9 pages. Published in Robotics and Automation Magazine

arXiv:2207.14252 [pdf, other]

doi 10.1109/CDC51059.2022.9993350

Improving the Performance of Robust Control through Event-Triggered Learning

Authors: Alexander von Rohr, Friedrich Solowjow, Sebastian Trimpe

Abstract: Robust controllers ensure stability in feedback loops designed under uncertainty but at the cost of performance. Model uncertainty in time-invariant systems can be reduced by recently proposed learning-based methods, which improve the performance of robust controllers using data. However, in practice, many systems also exhibit uncertainty in the form of changes over time, e.g., due to weight shift… ▽ More Robust controllers ensure stability in feedback loops designed under uncertainty but at the cost of performance. Model uncertainty in time-invariant systems can be reduced by recently proposed learning-based methods, which improve the performance of robust controllers using data. However, in practice, many systems also exhibit uncertainty in the form of changes over time, e.g., due to weight shifts or wear and tear, leading to decreased performance or instability of the learning-based controller. We propose an event-triggered learning algorithm that decides when to learn in the face of uncertainty in the LQR problem with rare or slow changes. Our key idea is to switch between robust and learned controllers. For learning, we first approximate the optimal length of the learning phase via Monte-Carlo estimations using a probabilistic model. We then design a statistical test for uncertain systems based on the moment-generating function of the LQR cost. The test detects changes in the system under control and triggers re-learning when control performance deteriorates due to system changes. We demonstrate improved performance over a robust controller baseline in a numerical example. △ Less

Submitted 21 September, 2022; v1 submitted 28 July, 2022; originally announced July 2022.

Comments: To appear in the proceedings of the 61st IEEE Conference on Decision and Control

arXiv:2207.11120 [pdf, other]

doi 10.1109/CDC51059.2022.9992649

On Controller Tuning with Time-Varying Bayesian Optimization

Authors: Paul Brunzema, Alexander von Rohr, Sebastian Trimpe

Abstract: Changing conditions or environments can cause system dynamics to vary over time. To ensure optimal control performance, controllers should adapt to these changes. When the underlying cause and time of change is unknown, we need to rely on online data for this adaptation. In this paper, we will use time-varying Bayesian optimization (TVBO) to tune controllers online in changing environments using a… ▽ More Changing conditions or environments can cause system dynamics to vary over time. To ensure optimal control performance, controllers should adapt to these changes. When the underlying cause and time of change is unknown, we need to rely on online data for this adaptation. In this paper, we will use time-varying Bayesian optimization (TVBO) to tune controllers online in changing environments using appropriate prior knowledge on the control objective and its changes. Two properties are characteristic of many online controller tuning problems: First, they exhibit incremental and lasting changes in the objective due to changes to the system dynamics, e.g., through wear and tear. Second, the optimization problem is convex in the tuning parameters. Current TVBO methods do not explicitly account for these properties, resulting in poor tuning performance and many unstable controllers through over-exploration of the parameter space. We propose a novel TVBO forgetting strategy using Uncertainty-Injection (UI), which incorporates the assumption of incremental and lasting changes. The control objective is modeled as a spatio-temporal Gaussian process (GP) with UI through a Wiener process in the temporal domain. Further, we explicitly model the convexity assumptions in the spatial dimension through GP models with linear inequality constraints. In numerical experiments, we show that our model outperforms the state-of-the-art method in TVBO, exhibiting reduced regret and fewer unstable parameter configurations. △ Less

Submitted 22 July, 2022; originally announced July 2022.

Comments: To appear in the proceedings of the 61st IEEE Conference on Decision and Control

Journal ref: IEEE 61st Conference on Decision and Control (2022), p. 4046-4052

arXiv:2207.06988 [pdf, other]

The Wheelbot: A Jumping Reaction Wheel Unicycle

Authors: A. René Geist, Jonathan Fiene, Naomi Tashiro, Zheng Jia, Sebastian Trimpe

Abstract: Combining off-the-shelf components with 3D-printing, the Wheelbot is a symmetric reaction wheel unicycle that can jump onto its wheels from any initial position. With non-holonomic and under-actuated dynamics, as well as two coupled unstable degrees of freedom, the Wheelbot provides a challenging platform for nonlinear and data-driven control research. This paper presents the Wheelbot's mechanical… ▽ More Combining off-the-shelf components with 3D-printing, the Wheelbot is a symmetric reaction wheel unicycle that can jump onto its wheels from any initial position. With non-holonomic and under-actuated dynamics, as well as two coupled unstable degrees of freedom, the Wheelbot provides a challenging platform for nonlinear and data-driven control research. This paper presents the Wheelbot's mechanical and electrical design, its estimation and control algorithms, as well as experiments demonstrating both self-erection and disturbance rejection while balancing. △ Less

Submitted 23 July, 2023; v1 submitted 14 July, 2022; originally announced July 2022.

Comments: Erratum: In the initial publication, Equation (3) was wrong and has been corrected in this version. Equation (3) relates to the transform from averaged body rates ${}^{\text{B}}ω_i$ to Euler rates. Importantly, the results in this papers are not affected by the wrong transform. More details are found in the projects github repo: https://github.com/AndReGeist/wheelbot-v2.5

arXiv:2206.04531 [pdf, other]

ECLAD: Extracting Concepts with Local Aggregated Descriptors

Authors: Andres Felipe Posada-Moreno, Nikita Surya, Sebastian Trimpe

Abstract: Convolutional neural networks (CNNs) are increasingly being used in critical systems, where robustness and alignment are crucial. In this context, the field of explainable artificial intelligence has proposed the generation of high-level explanations of the prediction process of CNNs through concept extraction. While these methods can detect whether or not a concept is present in an image, they ar… ▽ More Convolutional neural networks (CNNs) are increasingly being used in critical systems, where robustness and alignment are crucial. In this context, the field of explainable artificial intelligence has proposed the generation of high-level explanations of the prediction process of CNNs through concept extraction. While these methods can detect whether or not a concept is present in an image, they are unable to determine its location. What is more, a fair comparison of such approaches is difficult due to a lack of proper validation procedures. To address these issues, we propose a novel method for automatic concept extraction and localization based on representations obtained through pixel-wise aggregations of CNN activation maps. Further, we introduce a process for the validation of concept-extraction techniques based on synthetic datasets with pixel-wise annotations of their main components, reducing the need for human intervention. Extensive experimentation on both synthetic and real-world datasets demonstrates that our method outperforms state-of-the-art alternatives. △ Less

Submitted 11 August, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

Comments: 34 pages, under review

MSC Class: 68T01 ACM Class: I.2.10; I.2.m

arXiv:2205.12550 [pdf, other]

Recognition Models to Learn Dynamics from Partial Observations with Neural ODEs

Authors: Mona Buisson-Fenet, Valery Morgenthaler, Sebastian Trimpe, Florent Di Meglio

Abstract: Identifying dynamical systems from experimental data is a notably difficult task. Prior knowledge generally helps, but the extent of this knowledge varies with the application, and customized models are often needed. Neural ordinary differential equations can be written as a flexible framework for system identification and can incorporate a broad spectrum of physical insight, giving physical inter… ▽ More Identifying dynamical systems from experimental data is a notably difficult task. Prior knowledge generally helps, but the extent of this knowledge varies with the application, and customized models are often needed. Neural ordinary differential equations can be written as a flexible framework for system identification and can incorporate a broad spectrum of physical insight, giving physical interpretability to the resulting latent space. In the case of partial observations, however, the data points cannot directly be mapped to the latent state of the ODE. Hence, we propose to design recognition models, in particular inspired by nonlinear observer theory, to link the partial observations to the latent state. We demonstrate the performance of the proposed approach on numerical simulations and on an experimental dataset from a robotic exoskeleton. △ Less

Submitted 12 January, 2023; v1 submitted 25 May, 2022; originally announced May 2022.

arXiv:2202.06052 [pdf, other]

Learning by Doing: Controlling a Dynamical System using Causality, Control, and Reinforcement Learning

Authors: Sebastian Weichwald, Søren Wengel Mogensen, Tabitha Edith Lee, Dominik Baumann, Oliver Kroemer, Isabelle Guyon, Sebastian Trimpe, Jonas Peters, Niklas Pfister

Abstract: Questions in causality, control, and reinforcement learning go beyond the classical machine learning task of prediction under i.i.d. observations. Instead, these fields consider the problem of learning how to actively perturb a system to achieve a certain effect on a response variable. Arguably, they have complementary views on the problem: In control, one usually aims to first identify the system… ▽ More Questions in causality, control, and reinforcement learning go beyond the classical machine learning task of prediction under i.i.d. observations. Instead, these fields consider the problem of learning how to actively perturb a system to achieve a certain effect on a response variable. Arguably, they have complementary views on the problem: In control, one usually aims to first identify the system by excitation strategies to then apply model-based design techniques to control the system. In (non-model-based) reinforcement learning, one directly optimizes a reward. In causality, one focus is on identifiability of causal structure. We believe that combining the different views might create synergies and this competition is meant as a first step toward such synergies. The participants had access to observational and (offline) interventional data generated by dynamical systems. Track CHEM considers an open-loop problem in which a single impulse at the beginning of the dynamics can be set, while Track ROBO considers a closed-loop problem in which control variables can be set at each time step. The goal in both tracks is to infer controls that drive the system to a desired state. Code is open-sourced ( https://github.com/LearningByDoingCompetition/learningbydoing-comp ) to reproduce the winning solutions of the competition and to facilitate trying out new methods on the competition tasks. △ Less

Submitted 12 February, 2022; originally announced February 2022.

Comments: https://learningbydoingcompetition.github.io/

arXiv:2201.09562 [pdf, other]

doi 10.1016/j.artint.2023.103922

GoSafeOpt: Scalable Safe Exploration for Global Optimization of Dynamical Systems

Authors: Bhavya Sukhija, Matteo Turchetta, David Lindner, Andreas Krause, Sebastian Trimpe, Dominik Baumann

Abstract: Learning optimal control policies directly on physical systems is challenging since even a single failure can lead to costly hardware damage. Most existing model-free learning methods that guarantee safety, i.e., no failures, during exploration are limited to local optima. A notable exception is the GoSafe algorithm, which, unfortunately, cannot handle high-dimensional systems and hence cannot be… ▽ More Learning optimal control policies directly on physical systems is challenging since even a single failure can lead to costly hardware damage. Most existing model-free learning methods that guarantee safety, i.e., no failures, during exploration are limited to local optima. A notable exception is the GoSafe algorithm, which, unfortunately, cannot handle high-dimensional systems and hence cannot be applied to most real-world dynamical systems. This work proposes GoSafeOpt as the first algorithm that can safely discover globally optimal policies for high-dimensional systems while giving safety and optimality guarantees. We demonstrate the superiority of GoSafeOpt over competing model-free safe learning methods on a robot arm that would be prohibitive for GoSafe. △ Less

Submitted 12 June, 2023; v1 submitted 24 January, 2022; originally announced January 2022.

Journal ref: Artificial Intelligence, Volume 320, Year 2023

arXiv:2106.11899 [pdf, other]

Local policy search with Bayesian optimization

Authors: Sarah Müller, Alexander von Rohr, Sebastian Trimpe

Abstract: Reinforcement learning (RL) aims to find an optimal policy by interaction with an environment. Consequently, learning complex behavior requires a vast number of samples, which can be prohibitive in practice. Nevertheless, instead of systematically reasoning and actively choosing informative samples, policy gradients for local search are often obtained from random perturbations. These random sample… ▽ More Reinforcement learning (RL) aims to find an optimal policy by interaction with an environment. Consequently, learning complex behavior requires a vast number of samples, which can be prohibitive in practice. Nevertheless, instead of systematically reasoning and actively choosing informative samples, policy gradients for local search are often obtained from random perturbations. These random samples yield high variance estimates and hence are sub-optimal in terms of sample complexity. Actively selecting informative samples is at the core of Bayesian optimization, which constructs a probabilistic surrogate of the objective from past samples to reason about informative subsequent ones. In this paper, we propose to join both worlds. We develop an algorithm utilizing a probabilistic model of the objective function and its gradient. Based on the model, the algorithm decides where to query a noisy zeroth-order oracle to improve the gradient estimates. The resulting algorithm is a novel type of policy search method, which we compare to existing black-box algorithms. The comparison reveals improved sample complexity and reduced variance in extensive empirical evaluations on synthetic objectives. Further, we highlight the benefits of active sampling on popular RL benchmarks. △ Less

Submitted 22 November, 2021; v1 submitted 22 June, 2021; originally announced June 2021.

Comments: Presented at the Thirty-fifth Conference on Neural Information Processing Systems (NeurIPS), 2021

arXiv:2105.13281 [pdf, other]

GoSafe: Globally Optimal Safe Robot Learning

Authors: Dominik Baumann, Alonso Marco, Matteo Turchetta, Sebastian Trimpe

Abstract: When learning policies for robotic systems from data, safety is a major concern, as violation of safety constraints may cause hardware damage. SafeOpt is an efficient Bayesian optimization (BO) algorithm that can learn policies while guaranteeing safety with high probability. However, its search space is limited to an initially given safe region. We extend this method by exploring outside the init… ▽ More When learning policies for robotic systems from data, safety is a major concern, as violation of safety constraints may cause hardware damage. SafeOpt is an efficient Bayesian optimization (BO) algorithm that can learn policies while guaranteeing safety with high probability. However, its search space is limited to an initially given safe region. We extend this method by exploring outside the initial safe area while still guaranteeing safety with high probability. This is achieved by learning a set of initial conditions from which we can recover safely using a learned backup controller in case of a potential failure. We derive conditions for guaranteed convergence to the global optimum and validate GoSafe in hardware experiments. △ Less

Submitted 27 May, 2021; originally announced May 2021.

arXiv:2105.12204 [pdf, other]

doi 10.1109/TAC.2022.3200948

Safe Value Functions

Authors: Pierre-François Massiani, Steve Heim, Friedrich Solowjow, Sebastian Trimpe

Abstract: Safety constraints and optimality are important, but sometimes conflicting criteria for controllers. Although these criteria are often solved separately with different tools to maintain formal guarantees, it is also common practice in reinforcement learning to simply modify reward functions by penalizing failures, with the penalty treated as a mere heuristic. We rigorously examine the relationship… ▽ More Safety constraints and optimality are important, but sometimes conflicting criteria for controllers. Although these criteria are often solved separately with different tools to maintain formal guarantees, it is also common practice in reinforcement learning to simply modify reward functions by penalizing failures, with the penalty treated as a mere heuristic. We rigorously examine the relationship of both safety and optimality to penalties, and formalize sufficient conditions for safe value functions (SVFs): value functions that are both optimal for a given task, and enforce safety constraints. We reveal this structure by examining when rewards preserve viability under optimal control, and show that there always exists a finite penalty that induces a safe value function. This penalty is not unique, but upper-unbounded: larger penalties do not harm optimality. Although it is often not possible to compute the minimum required penalty, we reveal clear structure of how the penalty, rewards, discount factor, and dynamics interact. This insight suggests practical, theory-guided heuristics to design reward functions for control problems where safety is important. △ Less

Submitted 1 December, 2022; v1 submitted 25 May, 2021; originally announced May 2021.

Comments: 16 pages, 6 figures. Accepted for publication in: Transactions of Automatic Control, special issue on Learning and Control

Journal ref: IEEE Transactions of Automatic Control 68, Issue 5 (2023) 2743 -- 2757

arXiv:2105.07668 [pdf, other]

Probabilistic Robust Linear Quadratic Regulators with Gaussian Processes

Authors: Alexander von Rohr, Matthias Neumann-Brosig, Sebastian Trimpe

Abstract: Probabilistic models such as Gaussian processes (GPs) are powerful tools to learn unknown dynamical systems from data for subsequent use in control design. While learning-based control has the potential to yield superior performance in demanding applications, robustness to uncertainty remains an important challenge. Since Bayesian methods quantify uncertainty of the learning results, it is natural… ▽ More Probabilistic models such as Gaussian processes (GPs) are powerful tools to learn unknown dynamical systems from data for subsequent use in control design. While learning-based control has the potential to yield superior performance in demanding applications, robustness to uncertainty remains an important challenge. Since Bayesian methods quantify uncertainty of the learning results, it is natural to incorporate these uncertainties into a robust design. In contrast to most state-of-the-art approaches that consider worst-case estimates, we leverage the learning method's posterior distribution in the controller synthesis. The result is a more informed and, thus, more efficient trade-off between performance and robustness. We present a novel controller synthesis for linearized GP dynamics that yields robust controllers with respect to a probabilistic stability margin. The formulation is based on a recently proposed algorithm for linear quadratic control synthesis, which we extend by giving probabilistic robustness guarantees in the form of credibility bounds for the system's stability.Comparisons to existing methods based on worst-case and certainty-equivalence designs reveal superior performance and robustness properties of the proposed method. △ Less

Submitted 21 September, 2022; v1 submitted 17 May, 2021; originally announced May 2021.

Comments: Published in the proceedings of the 3rd Conference on Learning for Dynamics and Control, this version fixes a typo in Algorithm 1

Showing 1–50 of 87 results for author: Trimpe, S