-
Beyond KL-divergence: Risk Aware Control Through Cross Entropy and Adversarial Entropy Regularization
Authors:
Menno van Zutphen,
Domagoj Herceg,
Duarte J. Antunes
Abstract:
While the idea of robust dynamic programming (DP) is compelling for systems affected by uncertainty, addressing worst-case disturbances generally results in excessive conservatism. This paper introduces a method for constructing control policies robust to adversarial disturbance distributions that relate to a provided empirical distribution. The character of the adversary is shaped by a regulariza…
▽ More
While the idea of robust dynamic programming (DP) is compelling for systems affected by uncertainty, addressing worst-case disturbances generally results in excessive conservatism. This paper introduces a method for constructing control policies robust to adversarial disturbance distributions that relate to a provided empirical distribution. The character of the adversary is shaped by a regularization term comprising a weighted sum of (i) the cross-entropy between the empirical and the adversarial distributions, and (ii) the entropy of the adversarial distribution itself. The regularization weights are interpreted as the likelihood factor and the temperature respectively. The proposed framework leads to an efficient DP-like algorithm -- referred to as the minsoftmax algorithm -- to obtain the optimal control policy, where the disturbances follow an analytical softmax distribution in terms of the empirical distribution, temperature, and likelihood factor. It admits a number of control-theoretic interpretations and can thus be understood as a flexible tool for integrating complementary features of related control frameworks. In particular, in the linear model quadratic cost setting, with a Gaussian empirical distribution, we draw connections to the well-known $\mathcal{H}_{\infty}$-control. We illustrate our results through a numerical example.
△ Less
Submitted 16 May, 2025;
originally announced May 2025.
-
Smart Exploration in Reinforcement Learning using Bounded Uncertainty Models
Authors:
J. S. van Hulst,
W. P. M. H. Heemels,
D. J. Antunes
Abstract:
Reinforcement learning (RL) is a powerful tool for decision-making in uncertain environments, but it often requires large amounts of data to learn an optimal policy. We propose using prior model knowledge to guide the exploration process to speed up this learning process. This model knowledge comes in the form of a model set to which the true transition kernel and reward function belong. We optimi…
▽ More
Reinforcement learning (RL) is a powerful tool for decision-making in uncertain environments, but it often requires large amounts of data to learn an optimal policy. We propose using prior model knowledge to guide the exploration process to speed up this learning process. This model knowledge comes in the form of a model set to which the true transition kernel and reward function belong. We optimize over this model set to obtain upper and lower bounds on the Q-function, which are then used to guide the exploration of the agent. We provide theoretical guarantees on the convergence of the Q-function to the optimal Q-function under the proposed class of exploring policies. Furthermore, we also introduce a data-driven regularized version of the model set optimization problem that ensures the convergence of the class of exploring policies to the optimal policy. Lastly, we show that when the model set has a specific structure, namely the bounded-parameter MDP (BMDP) framework, the regularized model set optimization problem becomes convex and simple to implement. In this setting, we also show that we obtain finite-time convergence to the optimal policy under additional assumptions. We demonstrate the effectiveness of the proposed exploration strategy in a simulation study. The results indicate that the proposed method can significantly speed up the learning process in reinforcement learning.
△ Less
Submitted 8 April, 2025;
originally announced April 2025.
-
Data-Efficient Quadratic Q-Learning Using LMIs
Authors:
J. S. van Hulst,
W. P. M. H. Heemels,
D. J. Antunes
Abstract:
Reinforcement learning (RL) has seen significant research and application results but often requires large amounts of training data. This paper proposes two data-efficient off-policy RL methods that use parametrized Q-learning. In these methods, the Q-function is chosen to be linear in the parameters and quadratic in selected basis functions in the state and control deviations from a base policy.…
▽ More
Reinforcement learning (RL) has seen significant research and application results but often requires large amounts of training data. This paper proposes two data-efficient off-policy RL methods that use parametrized Q-learning. In these methods, the Q-function is chosen to be linear in the parameters and quadratic in selected basis functions in the state and control deviations from a base policy. A cost penalizing the $\ell_1$-norm of Bellman errors is minimized. We propose two methods: Linear Matrix Inequality Q-Learning (LMI-QL) and its iterative variant (LMI-QLi), which solve the resulting episodic optimization problem through convex optimization. LMI-QL relies on a convex relaxation that yields a semidefinite programming (SDP) problem with linear matrix inequalities (LMIs). LMI-QLi entails solving sequential iterations of an SDP problem. Both methods combine convex optimization with direct Q-function learning, significantly improving learning speed. A numerical case study demonstrates their advantages over existing parametrized Q-learning methods.
△ Less
Submitted 18 September, 2024;
originally announced September 2024.
-
How improving performance may imply losing consistency in event-triggered consensus
Authors:
David Meister,
Duarte J. Antunes,
Frank Allgöwer
Abstract:
Event-triggered control is often argued to lower the average triggering rate compared to time-triggered control while still achieving a desired control goal, e.g., the same performance level. However, this property, often called consistency, cannot be taken for granted and can be hard to analyze in many settings. In particular, although numerous decentralized event-triggered control schemes have b…
▽ More
Event-triggered control is often argued to lower the average triggering rate compared to time-triggered control while still achieving a desired control goal, e.g., the same performance level. However, this property, often called consistency, cannot be taken for granted and can be hard to analyze in many settings. In particular, although numerous decentralized event-triggered control schemes have been proposed in the past years, their performance properties with respect to time-triggered control remain mostly unexplored. In this paper, we therefore examine the performance properties of event-triggered control (relative to time-triggered control) for a single-integrator consensus problem with a level-triggering rule. We consider the long-term average quadratic deviation from consensus as a performance measure. For this setting, we show that enriching the information the local controllers use improves the performance of the consensus algorithm but renders a previously consistent event-triggered control scheme inconsistent. In addition, we do so while deploying optimal control inputs which we derive for both information cases and all triggering schemes. With this insight, we can furthermore explain the relationship between two contrasting consistency results from the literature on decentralized event-triggered control. We support our theoretical findings with simulation results.
△ Less
Submitted 6 May, 2024;
originally announced May 2024.
-
Decentralized LQ-Consistent Event-triggered Control over a Shared Contention-based Network
Authors:
M. Balaghiinaloo,
D. J. Antunes,
M. H. Mamduhi,
S. Hirche
Abstract:
Consider a network of multiple independent stochastic linear systems where, for each system, a scheduler collocated with the sensors arbitrates data transmissions to a corresponding remote controller through a shared contention-based communication network. While the systems are physically independent, their optimal controller design problems may, in general, become coupled, due to network contenti…
▽ More
Consider a network of multiple independent stochastic linear systems where, for each system, a scheduler collocated with the sensors arbitrates data transmissions to a corresponding remote controller through a shared contention-based communication network. While the systems are physically independent, their optimal controller design problems may, in general, become coupled, due to network contention, if the schedulers trigger transmissions based on state-dependent events. In this article we propose a class of probabilistic admissible schedulers for which the optimal controllers, with respect to local standard LQG costs, have the certainty equivalence property and can still be determined decentrally. Then, two scheduling policies within this class are introduced; a non-event-based and an event-based, both with an easily adjustable triggering probability at every time step. We then prove that, for each closed-loop system, the event-based scheduler and its optimal controller outperforms the closed-loop system with the non-event-based scheduler and its associated optimal controller. Moreover, we show that, for each closed-loop system, the optimal state estimators for both scheduling policies follows a linear iteration. Finally, we provide a method to regulate the triggering probabilities of the schedulers by maximizing a network utility function.
△ Less
Submitted 29 April, 2020; v1 submitted 10 October, 2019;
originally announced October 2019.
-
Trajectory Tracking for Quadrotors with Attitude Control on $\mathcal{S}^2 \times \mathcal{S}^1$
Authors:
Dave Kooijman,
Angela P. Schoellig,
Duarte J. Antunes
Abstract:
The control of a quadrotor is typically split into two subsequent problems: finding desired accelerations to control its position, and controlling its attitude and the total thrust to track these accelerations and to track a yaw angle reference. While the thrust vector, generating accelerations, and the angle of rotation about the thrust vector, determining the yaw angle, can be controlled indepen…
▽ More
The control of a quadrotor is typically split into two subsequent problems: finding desired accelerations to control its position, and controlling its attitude and the total thrust to track these accelerations and to track a yaw angle reference. While the thrust vector, generating accelerations, and the angle of rotation about the thrust vector, determining the yaw angle, can be controlled independently, most attitude control strategies in the literature, relying on representations in terms of quaternions, rotation matrices or Euler angles, result in an unnecessary coupling between the control of the thrust vector and of the angle about this vector. This leads, for instance, to undesired position tracking errors due to yaw tracking errors. In this paper we propose to tackle the attitude control problem using an attitude representation in the Cartesian product of the 2-sphere and the 1-sphere, denoted by $\mathcal{S}^2\times \mathcal{S}^1$. We propose a non-linear tracking control law on $\mathcal{S}^2\times \mathcal{S}^1$ that decouples the control of the thrust vector and of the angle of rotation about the thrust vector, and guarantees almost global asymptotic stability. Simulation results highlight the advantages of the proposed approach over previous approaches.
△ Less
Submitted 17 June, 2019;
originally announced June 2019.