-
Some remarks on stochastic converse Lyapunov theorems
Authors:
Pavel Osinenko,
Grigory Yaremenko
Abstract:
In this brief note, we investigate some constructions of Lyapunov functions for stochastic discrete-time stabilizable dynamical systems, in other words, controlled Markov chains. The main question here is whether a Lyapunov function in some statistical sense exists if the respective controlled Markov chain admits a stabilizing policy. We demonstrate some constructions extending on the classical re…
▽ More
In this brief note, we investigate some constructions of Lyapunov functions for stochastic discrete-time stabilizable dynamical systems, in other words, controlled Markov chains. The main question here is whether a Lyapunov function in some statistical sense exists if the respective controlled Markov chain admits a stabilizing policy. We demonstrate some constructions extending on the classical results for deterministic systems. Some limitations of the constructed Lyapunov functions for stabilization are discussed, particularly for stabilization in mean. Although results for deterministic systems are well known, the stochastic case was addressed in less detail, which the current paper remarks on. A distinguishable feature of this work is the study of stabilizers that possess computationally tractable convergence certificates.
△ Less
Submitted 6 June, 2025;
originally announced June 2025.
-
A universal policy wrapper with guarantees
Authors:
Anton Bolychev,
Georgiy Malaniya,
Grigory Yaremenko,
Anastasia Krasnaya,
Pavel Osinenko
Abstract:
We introduce a universal policy wrapper for reinforcement learning agents that ensures formal goal-reaching guarantees. In contrast to standard reinforcement learning algorithms that excel in performance but lack rigorous safety assurances, our wrapper selectively switches between a high-performing base policy -- derived from any existing RL method -- and a fallback policy with known convergence p…
▽ More
We introduce a universal policy wrapper for reinforcement learning agents that ensures formal goal-reaching guarantees. In contrast to standard reinforcement learning algorithms that excel in performance but lack rigorous safety assurances, our wrapper selectively switches between a high-performing base policy -- derived from any existing RL method -- and a fallback policy with known convergence properties. Base policy's value function supervises this switching process, determining when the fallback policy should override the base policy to ensure the system remains on a stable path. The analysis proves that our wrapper inherits the fallback policy's goal-reaching guarantees while preserving or improving upon the performance of the base policy. Notably, it operates without needing additional system knowledge or online constrained optimization, making it readily deployable across diverse reinforcement learning architectures and tasks.
△ Less
Submitted 18 May, 2025;
originally announced May 2025.
-
Multi-CALF: A Policy Combination Approach with Statistical Guarantees
Authors:
Georgiy Malaniya,
Anton Bolychev,
Grigory Yaremenko,
Anastasia Krasnaya,
Pavel Osinenko
Abstract:
We introduce Multi-CALF, an algorithm that intelligently combines reinforcement learning policies based on their relative value improvements. Our approach integrates a standard RL policy with a theoretically-backed alternative policy, inheriting formal stability guarantees while often achieving better performance than either policy individually. We prove that our combined policy converges to a spe…
▽ More
We introduce Multi-CALF, an algorithm that intelligently combines reinforcement learning policies based on their relative value improvements. Our approach integrates a standard RL policy with a theoretically-backed alternative policy, inheriting formal stability guarantees while often achieving better performance than either policy individually. We prove that our combined policy converges to a specified goal set with known probability and provide precise bounds on maximum deviation and convergence time. Empirical validation on control tasks demonstrates enhanced performance while maintaining stability guarantees.
△ Less
Submitted 18 May, 2025;
originally announced May 2025.
-
Quadrupedal Robot Skateboard Mounting via Reverse Curriculum Learning
Authors:
Danil Belov,
Artem Erkhov,
Elizaveta Pestova,
Ilya Osokin,
Dzmitry Tsetserukou,
Pavel Osinenko
Abstract:
The aim of this work is to enable quadrupedal robots to mount skateboards using Reverse Curriculum Reinforcement Learning. Although prior work has demonstrated skateboarding for quadrupeds that are already positioned on the board, the initial mounting phase still poses a significant challenge. A goal-oriented methodology was adopted, beginning with the terminal phases of the task and progressively…
▽ More
The aim of this work is to enable quadrupedal robots to mount skateboards using Reverse Curriculum Reinforcement Learning. Although prior work has demonstrated skateboarding for quadrupeds that are already positioned on the board, the initial mounting phase still poses a significant challenge. A goal-oriented methodology was adopted, beginning with the terminal phases of the task and progressively increasing the complexity of the problem definition to approximate the desired objective. The learning process was initiated with the skateboard rigidly fixed within the global coordinate frame and the robot positioned directly above it. Through gradual relaxation of these initial conditions, the learned policy demonstrated robustness to variations in skateboard position and orientation, ultimately exhibiting a successful transfer to scenarios involving a mobile skateboard. The code, trained models, and reproducible examples are available at the following link: https://github.com/dancher00/quadruped-skateboard-mounting
△ Less
Submitted 10 May, 2025;
originally announced May 2025.
-
Some remarks on practical stabilization via CLF-based control under measurement noise
Authors:
Patrick Schmidt,
Pavel Osinenko,
Stefan Streif
Abstract:
Practical stabilization of input-affine systems in the presence of measurement errors and input constraints is considered in this brief note. Assuming that a Lyapunov function and a stabilizing control exist for an input-affine system, the required measurement accuracy at each point of the state space is computed. This is done via the Lyapunov function-based decay condition, which describes along…
▽ More
Practical stabilization of input-affine systems in the presence of measurement errors and input constraints is considered in this brief note. Assuming that a Lyapunov function and a stabilizing control exist for an input-affine system, the required measurement accuracy at each point of the state space is computed. This is done via the Lyapunov function-based decay condition, which describes along with the input constraints a set of admissible controls. Afterwards, the measurement time points are computed based on the system dynamics. It is shown that between these self-triggered measurement time points, the system evolves and converges into the so-called target ball, i.e. a vicinity of the origin, where it remains. Furthermore, it is shown that the approach ensures the existence of a control law, which is admissible for all possible states and it introduces a connection between measurement time points, measurement accuracy, target ball, and decay. The results of the approach are shown in three examples.
△ Less
Submitted 15 January, 2025;
originally announced January 2025.
-
Towards a constructive framework for control theory
Authors:
Pavel Osinenko
Abstract:
This work presents a framework for control theory based on constructive analysis to account for discrepancy between mathematical results and their implementation in a computer, also referred to as computational uncertainty. In control engineering, the latter is usually either neglected or considered submerged into some other type of uncertainty, such as system noise, and addressed within robust co…
▽ More
This work presents a framework for control theory based on constructive analysis to account for discrepancy between mathematical results and their implementation in a computer, also referred to as computational uncertainty. In control engineering, the latter is usually either neglected or considered submerged into some other type of uncertainty, such as system noise, and addressed within robust control. However, even robust control methods may be compromised when the mathematical objects involved in the respective algorithms fail to exist in exact form and subsequently fail to satisfy the required properties. For instance, in general stabilization using a control Lyapunov function, computational uncertainty may distort stability certificates or even destabilize the system despite robustness of the stabilization routine with regards to system, actuator and measurement noise. In fact, battling numerical problems in practical implementation of controllers is common among control engineers. Such observations indicate that computational uncertainty should indeed be addressed explicitly in controller synthesis and system analysis. The major contribution here is a fairly general framework for proof techniques in analysis and synthesis of control systems based on constructive analysis which explicitly states that every computation be doable only up to a finite precision thus accounting for computational uncertainty. A series of previous works is overviewed, including constructive system stability and stabilization, approximate optimal controls, eigenvalue problems, Caratheodory trajectories, measurable selectors. Additionally, a new constructive version of the Danskin's theorem, which is crucial in adversarial defense, is presented.
△ Less
Submitted 4 January, 2025;
originally announced January 2025.
-
A novel agent with formal goal-reaching guarantees: an experimental study with a mobile robot
Authors:
Grigory Yaremenko,
Dmitrii Dobriborsci,
Roman Zashchitin,
Ruben Contreras Maestre,
Ngoc Quoc Huy Hoang,
Pavel Osinenko
Abstract:
Reinforcement Learning (RL) has been shown to be effective and convenient for a number of tasks in robotics. However, it requires the exploration of a sufficiently large number of state-action pairs, many of which may be unsafe or unimportant. For instance, online model-free learning can be hazardous and inefficient in the absence of guarantees that a certain set of desired states will be reached…
▽ More
Reinforcement Learning (RL) has been shown to be effective and convenient for a number of tasks in robotics. However, it requires the exploration of a sufficiently large number of state-action pairs, many of which may be unsafe or unimportant. For instance, online model-free learning can be hazardous and inefficient in the absence of guarantees that a certain set of desired states will be reached during an episode. An increasingly common approach to address safety involves the addition of a shielding system that constrains the RL actions to a safe set of actions. In turn, a difficulty for such frameworks is how to effectively couple RL with the shielding system to make sure the exploration is not excessively restricted. This work presents a novel safe model-free RL agent called Critic As Lyapunov Function (CALF) and showcases how CALF can be used to improve upon control baselines in robotics in an efficient and convenient fashion while ensuring guarantees of stable goal reaching. The latter is a crucial part of safety, as seen generally. With CALF all state-action pairs remain explorable and yet reaching of desired goal states is formally guaranteed. Formal analysis is provided that shows the goal stabilization-ensuring properties of CALF and a set of real-world and numerical experiments with a non-holonomic wheeled mobile robot (WMR) TurtleBot3 Burger confirmed the superiority of CALF over such a well-established RL agent as proximal policy optimization (PPO), and a modified version of SARSA in a few-episode setting in terms of attained total cost.
△ Less
Submitted 23 September, 2024;
originally announced September 2024.
-
Critic as Lyapunov function (CALF): a model-free, stability-ensuring agent
Authors:
Pavel Osinenko,
Grigory Yaremenko,
Roman Zashchitin,
Anton Bolychev,
Sinan Ibrahim,
Dmitrii Dobriborsci
Abstract:
This work presents and showcases a novel reinforcement learning agent called Critic As Lyapunov Function (CALF) which is model-free and ensures online environment, in other words, dynamical system stabilization. Online means that in each learning episode, the said environment is stabilized. This, as demonstrated in a case study with a mobile robot simulator, greatly improves the overall learning p…
▽ More
This work presents and showcases a novel reinforcement learning agent called Critic As Lyapunov Function (CALF) which is model-free and ensures online environment, in other words, dynamical system stabilization. Online means that in each learning episode, the said environment is stabilized. This, as demonstrated in a case study with a mobile robot simulator, greatly improves the overall learning performance. The base actor-critic scheme of CALF is analogous to SARSA. The latter did not show any success in reaching the target in our studies. However, a modified version thereof, called SARSA-m here, did succeed in some learning scenarios. Still, CALF greatly outperformed the said approach. CALF was also demonstrated to improve a nominal stabilizer provided to it. In summary, the presented agent may be considered a viable approach to fusing classical control with reinforcement learning. Its concurrent approaches are mostly either offline or model-based, like, for instance, those that fuse model-predictive control into the agent.
△ Less
Submitted 15 September, 2024;
originally announced September 2024.
-
An agent design with goal reaching guarantees for enhancement of learning
Authors:
Pavel Osinenko,
Grigory Yaremenko,
Georgiy Malaniya,
Anton Bolychev,
Alexander Gepperth
Abstract:
Reinforcement learning is commonly concerned with problems of maximizing accumulated rewards in Markov decision processes. Oftentimes, a certain goal state or a subset of the state space attain maximal reward. In such a case, the environment may be considered solved when the goal is reached. Whereas numerous techniques, learning or non-learning based, exist for solving environments, doing so optim…
▽ More
Reinforcement learning is commonly concerned with problems of maximizing accumulated rewards in Markov decision processes. Oftentimes, a certain goal state or a subset of the state space attain maximal reward. In such a case, the environment may be considered solved when the goal is reached. Whereas numerous techniques, learning or non-learning based, exist for solving environments, doing so optimally is the biggest challenge. Say, one may choose a reward rate which penalizes the action effort. Reinforcement learning is currently among the most actively developed frameworks for solving environments optimally by virtue of maximizing accumulated reward, in other words, returns. Yet, tuning agents is a notoriously hard task as reported in a series of works. Our aim here is to help the agent learn a near-optimal policy efficiently while ensuring a goal reaching property of some basis policy that merely solves the environment. We suggest an algorithm, which is fairly flexible, and can be used to augment practically any agent as long as it comprises of a critic. A formal proof of a goal reaching property is provided. Comparative experiments on several problems under popular baseline agents provided an empirical evidence that the learning can indeed be boosted while ensuring goal reaching property.
△ Less
Submitted 21 August, 2024; v1 submitted 28 May, 2024;
originally announced May 2024.
-
On constructive extractability of measurable selectors of set-valued maps
Authors:
Pavel Osinenko,
Stefan Streif
Abstract:
This paper investigates the possibility of constructive extraction of measurable selector from set-valued maps which may commonly arise in viability theory, optimal control, discontinuous systems etc. For instance, existence of solutions to certain differential inclusions, often requires iterative extraction of measurable selectors. Next, optimal controls are in general non-unique which naturally…
▽ More
This paper investigates the possibility of constructive extraction of measurable selector from set-valued maps which may commonly arise in viability theory, optimal control, discontinuous systems etc. For instance, existence of solutions to certain differential inclusions, often requires iterative extraction of measurable selectors. Next, optimal controls are in general non-unique which naturally leads to an optimal set-valued function. Finally, a viable control law can be seen, in general, as a selector. It is known that selector theorems are non-constructive and so selectors cannot always be extracted. In this work, we analyze under which particular conditions selectors are constructively extractable. An algorithm is derived from the theorem and applied in a computational study with a three-wheel robot model.
△ Less
Submitted 9 March, 2024;
originally announced March 2024.
-
A framework for online, stabilizing reinforcement learning
Authors:
Grigory Yaremenko,
Georgiy Malaniya,
Pavel Osinenko
Abstract:
Online reinforcement learning is concerned with training an agent on-the-fly via dynamic interaction with the environment. Here, due to the specifics of the application, it is not generally possible to perform long pre-training, as it is commonly done in off-line, model-free approaches, which are akin to dynamic programming. Such applications may be found more frequently in industry, rather than i…
▽ More
Online reinforcement learning is concerned with training an agent on-the-fly via dynamic interaction with the environment. Here, due to the specifics of the application, it is not generally possible to perform long pre-training, as it is commonly done in off-line, model-free approaches, which are akin to dynamic programming. Such applications may be found more frequently in industry, rather than in pure digital fields, such as cloud services, video games, database management, etc., where reinforcement learning has been demonstrating success. Online reinforcement learning, in contrast, is more akin to classical control, which utilizes some model knowledge about the environment. Stability of the closed-loop (agent plus the environment) is a major challenge for such online approaches. In this paper, we tackle this problem by a special fusion of online reinforcement learning with elements of classical control, namely, based on the Lyapunov theory of stability. The idea is to start the agent at once, without pre-training, and learn approximately optimal policy under specially designed constraints, which guarantee stability. The resulting approach was tested in an extensive experimental study with a mobile robot. A nominal parking controller was used as a baseline. It was observed that the suggested agent could always successfully park the robot, while significantly improving the cost. While many approaches may be exploited for mobile robot control, we suggest that the experiments showed the promising potential of online reinforcement learning agents based on Lyapunov-like constraints. The presented methodology may be utilized in safety-critical, industrial applications where stability is necessary.
△ Less
Submitted 16 November, 2022; v1 submitted 18 July, 2022;
originally announced July 2022.
-
On stochastic stabilization via non-smooth control Lyapunov functions
Authors:
Pavel Osinenko,
Grigory Yaremenko,
Georgiy Malaniya
Abstract:
Control Lyapunov function is a central tool in stabilization. It generalizes an abstract energy function -- a Lyapunov function -- to the case of controlled systems. It is a known fact that most control Lyapunov functions are non-smooth -- so is the case in non-holonomic systems, like wheeled robots and cars. Frameworks for stabilization using non-smooth control Lyapunov functions exist, like Dini…
▽ More
Control Lyapunov function is a central tool in stabilization. It generalizes an abstract energy function -- a Lyapunov function -- to the case of controlled systems. It is a known fact that most control Lyapunov functions are non-smooth -- so is the case in non-holonomic systems, like wheeled robots and cars. Frameworks for stabilization using non-smooth control Lyapunov functions exist, like Dini aiming and steepest descent. This work generalizes the related results to the stochastic case. As the groundwork, sampled control scheme is chosen in which control actions are computed at discrete moments in time using discrete measurements of the system state. In such a setup, special attention should be paid to the sample-to-sample behavior of the control Lyapunov function. A particular challenge here is a random noise acting on the system. The central result of this work is a theorem that states, roughly, that if there is a, generally non-smooth, control Lyapunov function, the given stochastic dynamical system can be practically stabilized in the sample-and-hold mode meaning that the control actions are held constant within sampling time steps. A particular control method chosen is based on Moreau-Yosida regularization, in other words, inf-convolution of the control Lyapunov function, but the overall framework is extendable to further control schemes. It is assumed that the system noise be bounded almost surely, although the case of unbounded noise is briefly addressed.
△ Less
Submitted 7 November, 2022; v1 submitted 26 May, 2022;
originally announced May 2022.
-
A note on stabilizing reinforcement learning
Authors:
Pavel Osinenko,
Grigory Yaremenko,
Ilya Osokin
Abstract:
Reinforcement learning is a general methodology of adaptive optimal control that has attracted much attention in various fields ranging from video game industry to robot manipulators. Despite its remarkable performance demonstrations, plain reinforcement learning controllers do not guarantee stability which compromises their applicability in industry. To provide such guarantees, measures have to b…
▽ More
Reinforcement learning is a general methodology of adaptive optimal control that has attracted much attention in various fields ranging from video game industry to robot manipulators. Despite its remarkable performance demonstrations, plain reinforcement learning controllers do not guarantee stability which compromises their applicability in industry. To provide such guarantees, measures have to be taken. This gives rise to what could generally be called stabilizing reinforcement learning. Concrete approaches range from employment of human overseers to filter out unsafe actions to formally verified shields and fusion with classical stabilizing controllers. A line of attack that utilizes elements of adaptive control has become fairly popular in the recent years. In this note, we critically address such an approach in a fairly general actor-critic setup for nonlinear time-continuous environments. The actor network utilizes a so-called robustifying term that is supposed to compensate for the neural network errors. The corresponding stability analysis is based on the value function itself. We indicate a problem in such a stability analysis and provide a counterexample to the overall control scheme. Implications for such a line of attack in stabilizing reinforcement learning are discussed. Furthermore, unfortunately the said problem possess no fix without a substantial reconsideration of the whole approach. As a positive message, we derive a stochastic critic neural network weight convergence analysis provided that the environment was stabilized.
△ Less
Submitted 11 June, 2022; v1 submitted 24 November, 2021;
originally announced November 2021.
-
A study of first-passage time minimization via Q-learning in heated gridworlds
Authors:
M. A. Larchenko,
P. Osinenko,
G. Yaremenko,
V. V. Palyulin
Abstract:
Optimization of first-passage times is required in applications ranging from nanobots navigation to market trading. In such settings, one often encounters unevenly distributed noise levels across the environment. We extensively study how a learning agent fares in 1- and 2- dimensional heated gridworlds with an uneven temperature distribution. The results show certain bias effects in agents trained…
▽ More
Optimization of first-passage times is required in applications ranging from nanobots navigation to market trading. In such settings, one often encounters unevenly distributed noise levels across the environment. We extensively study how a learning agent fares in 1- and 2- dimensional heated gridworlds with an uneven temperature distribution. The results show certain bias effects in agents trained via simple tabular Q-learning, SARSA, Expected SARSA and Double Q-learning. While high learning rate prevents exploration of regions with higher temperature, low enough rate increases the presence of agents in such regions. The discovered peculiarities and biases of temporal-difference-based reinforcement learning methods should be taken into account in real-world physical applications and agent design.
△ Less
Submitted 5 October, 2021;
originally announced October 2021.
-
A generalized stacked reinforcement learning method for sampled systems
Authors:
Pavel Osinenko,
Dmitrii Dobriborsci,
Grigory Yaremenko,
Georgiy Malaniya
Abstract:
A common setting of reinforcement learning (RL) is a Markov decision process (MDP) in which the environment is a stochastic discrete-time dynamical system. Whereas MDPs are suitable in such applications as video-games or puzzles, physical systems are time-continuous. A general variant of RL is of digital format, where updates of the value (or cost) and policy are performed at discrete moments in t…
▽ More
A common setting of reinforcement learning (RL) is a Markov decision process (MDP) in which the environment is a stochastic discrete-time dynamical system. Whereas MDPs are suitable in such applications as video-games or puzzles, physical systems are time-continuous. A general variant of RL is of digital format, where updates of the value (or cost) and policy are performed at discrete moments in time. The agent-environment loop then amounts to a sampled system, whereby sample-and-hold is a specific case. In this paper, we propose and benchmark two RL methods suitable for sampled systems. Specifically, we hybridize model-predictive control (MPC) with critics learning the optimal Q- and value (or cost-to-go) function. Optimality is analyzed and performance comparison is done in an experimental case study with a mobile robot.
△ Less
Submitted 28 November, 2022; v1 submitted 23 August, 2021;
originally announced August 2021.
-
An experimental study of two predictive reinforcement learning methods and comparison with model-predictive control
Authors:
Dmitrii Dobriborsci,
Pavel Osinenko
Abstract:
Reinforcement learning (RL) has been successfully used in various simulations and computer games. Industry-related applications, such as autonomous mobile robot motion control, are somewhat challenging for RL up to date though. This paper presents an experimental evaluation of predictive RL controllers for optimal mobile robot motion control. As a baseline for comparison, model-predictive control…
▽ More
Reinforcement learning (RL) has been successfully used in various simulations and computer games. Industry-related applications, such as autonomous mobile robot motion control, are somewhat challenging for RL up to date though. This paper presents an experimental evaluation of predictive RL controllers for optimal mobile robot motion control. As a baseline for comparison, model-predictive control (MPC) is used. Two RL methods are tested: a roll-out Q-learning, which may be considered as MPC with terminal cost being a Q-function approximation, and a so-called stacked Q-learning, which in turn is like MPC with the running cost substituted for a Q-function approximation. The experimental foundation is a mobile robot with a differential drive (Robotis Turtlebot3). Experimental results showed that both RL methods beat the baseline in terms of the accumulated cost, whereas the stacked variant performed best. Provided the series of previous works on stacked Q-learning, this particular study supports the idea that MPC with a running cost adaptation inspired by Q-learning possesses potential of performance boost while retaining the nice properties of MPC.
△ Less
Submitted 23 August, 2021; v1 submitted 10 August, 2021;
originally announced August 2021.
-
Effects of sampling and horizon in predictive reinforcement learning
Authors:
Pavel Osinenko,
Dmitrii Dobriborsci
Abstract:
Plain reinforcement learning (RL) may be prone to loss of convergence, constraint violation, unexpected performance, etc. Commonly, RL agents undergo extensive learning stages to achieve acceptable functionality. This is in contrast to classical control algorithms which are typically model-based. An direction of research is the fusion of RL with such algorithms, especially model-predictive control…
▽ More
Plain reinforcement learning (RL) may be prone to loss of convergence, constraint violation, unexpected performance, etc. Commonly, RL agents undergo extensive learning stages to achieve acceptable functionality. This is in contrast to classical control algorithms which are typically model-based. An direction of research is the fusion of RL with such algorithms, especially model-predictive control (MPC). This, however, introduces new hyper-parameters related to the prediction horizon. Furthermore, RL is usually concerned with Markov decision processes. But the most of the real environments are not time-discrete. The factual physical setting of RL consists of a digital agent and a time-continuous dynamical system. There is thus, in fact, yet another hyper-parameter -- the agent sampling time. In this paper, we investigate the effects of prediction horizon and sampling of two hybrid RL-MPC-agents in a case study with a mobile robot parking, which is in turn a canonical control problem. We benchmark the agents with a simple variant of MPC. The sampling showed a kind of a "sweet spot" behavior, whereas the RL agents demonstrated merits at shorter horizons.
△ Less
Submitted 23 August, 2021; v1 submitted 10 August, 2021;
originally announced August 2021.
-
On stochastic stabilization of sampled systems
Authors:
Pavel Osinenko,
Grigory Yaremenko
Abstract:
This paper addresses stochastic stabilization in case where implementation of control policies is digital, i. e., when the dynamical system is treated continuous, whereas the control actions are held constant in predefined time steps. In such a setup, special attention should be paid to the sample-to-sample behavior of the involved Lyapunov function. This paper extends on the stochastic stability…
▽ More
This paper addresses stochastic stabilization in case where implementation of control policies is digital, i. e., when the dynamical system is treated continuous, whereas the control actions are held constant in predefined time steps. In such a setup, special attention should be paid to the sample-to-sample behavior of the involved Lyapunov function. This paper extends on the stochastic stability results specifically to address for the sample-and-hold mode. We show that if a Markov policy stabilizes the system in a suitable sense, then it also practically stabilizes it in the sample-and-hold sense. This establishes a bridge from an idealized continuous application of the policy to its digital implementation. The central result applies to dynamical systems described by stochastic differential equations driven by the standard Brownian motion. Generalizations are discussed, including the case of non-smooth Lyapunov functions for systems driven by bounded noise. A brief overview of bounded noise models is given.
△ Less
Submitted 7 November, 2022; v1 submitted 15 May, 2021;
originally announced May 2021.
-
On inf-convolution-based robust practical stabilization under computational uncertainty
Authors:
Patrick Schmidt,
Pavel Osinenko,
Stefan Streif
Abstract:
This work is concerned with practical stabilization of nonlinear systems by means of inf-convolution-based sample-and-hold control. It is a fairly general stabilization technique based on a generic non-smooth control Lyapunov function (CLF) and robust to actuator uncertainty, measurement noise, etc. The stabilization technique itself involves computation of descent directions of the CLF. It turns…
▽ More
This work is concerned with practical stabilization of nonlinear systems by means of inf-convolution-based sample-and-hold control. It is a fairly general stabilization technique based on a generic non-smooth control Lyapunov function (CLF) and robust to actuator uncertainty, measurement noise, etc. The stabilization technique itself involves computation of descent directions of the CLF. It turns out that non-exact realization of this computation leads not just to a quantitative, but also qualitative obstruction in the sense that the result of the computation might fail to be a descent direction altogether and there is also no straightforward way to relate it to a descent direction. Disturbance, primarily measurement noise, complicate the described issue even more. This work suggests a modified inf-convolution-based control that is robust w. r. t. system and measurement noise, as well as computational uncertainty. The assumptions on the CLF are mild, as, e. g., any piece-wise smooth function, which often results from a numerical LF/CLF construction, satisfies them. A computational study with a three-wheel robot with dynamical steering and throttle under various tolerances w. r. t. computational uncertainty demonstrates the relevance of the addressed issue and the necessity of modifying the used stabilization technique. Similar analyses may be extended to other methods which involve optimization, such as Dini aiming or steepest descent.
△ Less
Submitted 8 February, 2021;
originally announced February 2021.
-
Stacked adaptive dynamic programming with unknown system model
Authors:
Pavel Osinenko,
Thomas Göhrt,
Grigory Devadze,
Stefan Streif
Abstract:
Adaptive dynamic programming is a collective term for a variety of approaches to infinite-horizon optimal control. Common to all approaches is approximation of the infinite-horizon cost function based on dynamic programming philosophy. Typically, they also require knowledge of a dynamical model of the system. In the current work, application of adaptive dynamic programming to a system whose dynami…
▽ More
Adaptive dynamic programming is a collective term for a variety of approaches to infinite-horizon optimal control. Common to all approaches is approximation of the infinite-horizon cost function based on dynamic programming philosophy. Typically, they also require knowledge of a dynamical model of the system. In the current work, application of adaptive dynamic programming to a system whose dynamical model is unknown to the controller is addressed. In order to realize the control algorithm, a model of the system dynamics is estimated with a Kalman filter. A stacked control scheme to boost the controller performance is suggested. The functioning of the new approach was verified in simulation and compared to the baseline represented by gradient descent on the running cost.
△ Less
Submitted 8 July, 2020;
originally announced July 2020.
-
A reinforcement learning method with closed-loop stability guarantee
Authors:
Pavel Osinenko,
Lukas Beckenbach,
Thomas Göhrt,
Stefan Streif
Abstract:
Reinforcement learning (RL) in the context of control systems offers wide possibilities of controller adaptation. Given an infinite-horizon cost function, the so-called critic of RL approximates it with a neural net and sends this information to the controller (called "actor"). However, the issue of closed-loop stability under an RL-method is still not fully addressed. Since the critic delivers me…
▽ More
Reinforcement learning (RL) in the context of control systems offers wide possibilities of controller adaptation. Given an infinite-horizon cost function, the so-called critic of RL approximates it with a neural net and sends this information to the controller (called "actor"). However, the issue of closed-loop stability under an RL-method is still not fully addressed. Since the critic delivers merely an approximation to the value function of the corresponding infinite-horizon problem, no guarantee can be given in general as to whether the actor's actions stabilize the system. Different approaches to this issue exist. The current work offers a particular one, which, starting with a (not necessarily smooth) control Lyapunov function (CLF), derives an online RL-scheme in such a way that practical semi-global stability property of the closed-loop can be established. The approach logically continues the work of the authors on parameterized controllers and Lyapunov-like constraints for RL, whereas the CLF now appears merely in one of the constraints of the control scheme. The analysis of the closed-loop behavior is done in a sample-and-hold (SH) manner thus offering a certain insight into the digital realization. The case study with a non-holonomic integrator shows the capabilities of the derived method to optimize the given cost function compared to a nominal stabilizing controller.
△ Less
Submitted 24 June, 2020;
originally announced June 2020.
-
Nonsmooth stabilization and its computational aspects
Authors:
Pavel Osinenko,
Patrick Schmidt,
Stefan Streif
Abstract:
This work has the goal of briefly surveying some key stabilization techniques for general nonlinear systems, for which, as it is well known, a smooth control Lyapunov function may fail to exist. A general overview of the situation with smooth and nonsmooth stabilization is provided, followed by a concise summary of basic tools and techniques, including general stabilization, sliding-mode control a…
▽ More
This work has the goal of briefly surveying some key stabilization techniques for general nonlinear systems, for which, as it is well known, a smooth control Lyapunov function may fail to exist. A general overview of the situation with smooth and nonsmooth stabilization is provided, followed by a concise summary of basic tools and techniques, including general stabilization, sliding-mode control and nonsmooth backstepping. Their presentation is accompanied with examples. The survey is concluded with some remarks on computational aspects related to determination of sampling times and control actions.
△ Less
Submitted 2 July, 2020; v1 submitted 24 June, 2020;
originally announced June 2020.
-
A method of online traction parameter identification and mapping
Authors:
Alexander Kobelski,
Pavel Osinenko,
Stefan Streif
Abstract:
Fuel consumption of heavy-duty vehicles such as tractors, bulldozers etc. is comparably high due to their scope of operation. The operation settings are usually fixed and not tuned to the environmental factors, such as ground conditions. Yet exactly the ground-to-propelling-unit properties are decisive in energy efficiency. Optimizing the latter would require a means of identifying those propertie…
▽ More
Fuel consumption of heavy-duty vehicles such as tractors, bulldozers etc. is comparably high due to their scope of operation. The operation settings are usually fixed and not tuned to the environmental factors, such as ground conditions. Yet exactly the ground-to-propelling-unit properties are decisive in energy efficiency. Optimizing the latter would require a means of identifying those properties. This is the central matter of the current study. More specifically, the goal is to estimate the ground conditions from the available measurements, such as drive train signals, and to establish a map of those. The ground condition parameters are estimated using an adaptive unscented Kalman filter. A case study is provided with the actual and estimated ground condition maps. Such a mapping can be seen as a crucial milestone in optimal operation control of heavy-duty vehicles.
△ Less
Submitted 23 April, 2021; v1 submitted 12 June, 2020;
originally announced June 2020.
-
Model predictive control with stage cost shaping inspired by reinforcement learning
Authors:
Lukas Beckenbach,
Pavel Osinenko,
Stefan Streif
Abstract:
This work presents a suboptimality study of a particular model predictive control with a stage cost shaping based on the ideas of reinforcement learning. The focus of the suboptimality study is to derive quantities relating the infinite-horizon cost function under the said variant of model predictive control to the respective infinite-horizon value function. The basis control scheme involves usual…
▽ More
This work presents a suboptimality study of a particular model predictive control with a stage cost shaping based on the ideas of reinforcement learning. The focus of the suboptimality study is to derive quantities relating the infinite-horizon cost function under the said variant of model predictive control to the respective infinite-horizon value function. The basis control scheme involves usual stabilizing constraints comprising of a terminal set and a terminal cost in the form of a local Lyapunov function. The stage cost is adapted using the principles of Q-learning, a particular approach to reinforcement learning. The work is concluded by case studies with two systems for wide ranges of initial conditions.
△ Less
Submitted 27 April, 2020; v1 submitted 6 June, 2019;
originally announced June 2019.
-
Practical sample-and-hold stabilization of nonlinear systems under approximate optimizers
Authors:
Pavel Osinenko,
Lukas Beckenbach,
Stefan Streif
Abstract:
It is a known fact that not all controllable systems can be asymptotically stabilized by a continuous static feedback. Several approaches have been developed throughout the last decades, including time-varying, dynamical and even discontinuous feedbacks. In the latter case, the sample-and-hold framework is widely used, in which the control input is held constant during sampling periods. Consequent…
▽ More
It is a known fact that not all controllable systems can be asymptotically stabilized by a continuous static feedback. Several approaches have been developed throughout the last decades, including time-varying, dynamical and even discontinuous feedbacks. In the latter case, the sample-and-hold framework is widely used, in which the control input is held constant during sampling periods. Consequently, only practical stability can be achieved at best. Existing approaches often require solving optimization problems for finding stabilizing control actions exactly. In practice, each optimization routine has a finite accuracy which might influence the state convergence. This work shows, what bounds on optimization accuracy are required to achieve prescribed stability margins. Simulation studies support the claim that optimization accuracy has high influence on the state convergence.
△ Less
Submitted 22 June, 2018; v1 submitted 6 March, 2018;
originally announced March 2018.
-
Analysis of extremum value theorems for function spaces in optimal control under numerical uncertainty
Authors:
Pavel Osinenko,
Stefan Streif
Abstract:
The extremum value theorem for function spaces plays the central role in optimal control. It is known that computation of optimal control actions and policies is often prone to numerical errors which may be related to computability issues. The current work addresses a version of the extremum value theorem for function spaces under explicit consideration of numerical uncertainties. It is shown that…
▽ More
The extremum value theorem for function spaces plays the central role in optimal control. It is known that computation of optimal control actions and policies is often prone to numerical errors which may be related to computability issues. The current work addresses a version of the extremum value theorem for function spaces under explicit consideration of numerical uncertainties. It is shown that certain function spaces are bounded in a suitable sense i.e. they admit finite approximations up to an arbitrary precision. The proof of this fact is constructive in the sense that it explicitly builds the approximating functions. Consequently, existence of approximate extremal functions is shown. Applicability of the theorem is investigated for finite--horizon optimal control, dynamic programming and adaptive dynamic programming. Some possible computability issues of the extremum value theorem in optimal control are shown on counterexamples
△ Less
Submitted 22 June, 2018; v1 submitted 18 September, 2017;
originally announced September 2017.
-
A note on Brehm's extension theorem
Authors:
Pavel Osinenko
Abstract:
Brehm's extension theorem states that a non-expansive map on a finite subset of a Euclidean space can be extended to a piecewise-linear map on the entire space. In this note, it is verified that the proof of the theorem is constructive provided that the finite subset consists of points with rational coordinates. Additionally, the initial non-expansive map needs to send points with rational coordin…
▽ More
Brehm's extension theorem states that a non-expansive map on a finite subset of a Euclidean space can be extended to a piecewise-linear map on the entire space. In this note, it is verified that the proof of the theorem is constructive provided that the finite subset consists of points with rational coordinates. Additionally, the initial non-expansive map needs to send points with rational coordinates to points with rational coordinates. The two-dimensional case is considered.
△ Less
Submitted 3 October, 2016; v1 submitted 4 September, 2016;
originally announced September 2016.
-
A note on constructive treatment of eigenvectors
Authors:
Pavel Osinenko,
Grigory Devadze,
Stefan Streif
Abstract:
The eigenvalue problem plays a central role in linear algebra and its applications in control and optimization methods. In particular, many matrix decompositions rely upon computation of eigenvalue-eigenvector pairs, such as diagonal or Jordan normal forms. Unfortunately, numerical algorithms computing eigenvectors are prone to errors. Due to uncomputability of eigenpairs, perturbation theory and…
▽ More
The eigenvalue problem plays a central role in linear algebra and its applications in control and optimization methods. In particular, many matrix decompositions rely upon computation of eigenvalue-eigenvector pairs, such as diagonal or Jordan normal forms. Unfortunately, numerical algorithms computing eigenvectors are prone to errors. Due to uncomputability of eigenpairs, perturbation theory and various regularization techniques only help if the matrix at hand possesses certain properties such as the absence of non-zero singular values, or the presence of a distinguishable gap between the large and small singular values. Posing such a requirement might be restrictive in some practical applications. In this note, we propose an alternative treatment of eigenvectors which is approximate and constructive. In comparison to classical eigenvectors whose computation is often prone to numerical instability, a constructive treatment allows addressing the computational uncertainty in a controlled way.
△ Less
Submitted 14 July, 2016;
originally announced July 2016.