-
Assured Learning-enabled Autonomy: A Metacognitive Reinforcement Learning Framework
Authors:
Aquib Mustafa,
Majid Mazouchi,
Subramanya Nageshrao,
Hamidreza Modares
Abstract:
Reinforcement learning (RL) agents with pre-specified reward functions cannot provide guaranteed safety across variety of circumstances that an uncertain system might encounter. To guarantee performance while assuring satisfaction of safety constraints across variety of circumstances, an assured autonomous control framework is presented in this paper by empowering RL algorithms with metacognitive…
▽ More
Reinforcement learning (RL) agents with pre-specified reward functions cannot provide guaranteed safety across variety of circumstances that an uncertain system might encounter. To guarantee performance while assuring satisfaction of safety constraints across variety of circumstances, an assured autonomous control framework is presented in this paper by empowering RL algorithms with metacognitive learning capabilities. More specifically, adapting the reward function parameters of the RL agent is performed in a metacognitive decision-making layer to assure the feasibility of RL agent. That is, to assure that the learned policy by the RL agent satisfies safety constraints specified by signal temporal logic while achieving as much performance as possible. The metacognitive layer monitors any possible future safety violation under the actions of the RL agent and employs a higher-layer Bayesian RL algorithm to proactively adapt the reward function for the lower-layer RL agent. To minimize the higher-layer Bayesian RL intervention, a fitness function is leveraged by the metacognitive layer as a metric to evaluate success of the lower-layer RL agent in satisfaction of safety and liveness specifications, and the higher-layer Bayesian RL intervenes only if there is a risk of lower-layer RL failure. Finally, a simulation example is provided to validate the effectiveness of the proposed approach.
△ Less
Submitted 17 April, 2021; v1 submitted 23 March, 2021;
originally announced March 2021.
-
Fully-Heterogeneous Containment Control of a Network of Leader-Follower Systems
Authors:
Majid Mazouchi,
Farzaneh Tatari,
Bahare Kiumarsi,
Hamidreza Modares
Abstract:
This paper develops a distributed solution to the fully-heterogeneous containment control problem (CCP), for which not only the followers' dynamics but also the leaders' dynamics are non-identical. A novel formulation of the fully-heterogeneous CCP is first presented in which each follower constructs its virtual exo-system. To build these virtual exo-systems by followers, a novel distributed algor…
▽ More
This paper develops a distributed solution to the fully-heterogeneous containment control problem (CCP), for which not only the followers' dynamics but also the leaders' dynamics are non-identical. A novel formulation of the fully-heterogeneous CCP is first presented in which each follower constructs its virtual exo-system. To build these virtual exo-systems by followers, a novel distributed algorithm is developed to calculate the so-called normalized level of influences (NLIs) of all leaders on each follower and a novel adaptive distributed observer is designed to estimate the dynamics and states of all leaders that have an influence on each follower. Then, a distributed control protocol is proposed based on the cooperative output regulation framework, utilizing this virtual exo-system. Based on estimations of leaders' dynamics and states and NLIs of leaders on each follower, the solutions of the so-called linear regulator equations are calculated in a distributed manner, and consequently, a distributed control protocol is designed for solving the output containment problem. Finally, theoretical results are verified by performing numerical simulations.
△ Less
Submitted 9 June, 2021; v1 submitted 7 April, 2020;
originally announced April 2020.
-
Observer-based Adaptive Optimal Output Containment Control problem of Linear Heterogeneous Multi-agent Systems with Relative Output Measurements
Authors:
Majid Mazouchi,
Mohammad Bagher Naghibi-Sistani,
Seyed Kamal Hosseini Sani,
Farzaneh Tatari,
Hamidreza Modares
Abstract:
This paper develops an optimal relative output-feedback based solution to the containment control problem of linear heterogeneous multi-agent systems. A distributed optimal control protocol is presented for the followers to not only assure that their outputs fall into the convex hull of the leaders' output (i.e., the desired or safe region), but also optimizes their transient performance. The prop…
▽ More
This paper develops an optimal relative output-feedback based solution to the containment control problem of linear heterogeneous multi-agent systems. A distributed optimal control protocol is presented for the followers to not only assure that their outputs fall into the convex hull of the leaders' output (i.e., the desired or safe region), but also optimizes their transient performance. The proposed optimal control solution is composed of a feedback part, depending of the followers' state, and a feed-forward part, depending on the convex hull of the leaders' state. To comply with most real-world applications, the feedback and feed-forward states are assumed to be unavailable and are estimated using two distributed observers. That is, since the followers cannot directly sense their absolute states, a distributed observer is designed that uses only relative output measurements with respect to their neighbors (measured for example by using range sensors in robotic) and the information which is broadcasted by their neighbors to estimate their states. Moreover, another adaptive distributed observer is designed that uses exchange of information between followers over a communication network to estimate the convex hull of the leaders' state. The proposed observer relaxes the restrictive requirement of knowing the complete knowledge of the leaders' dynamics by all followers. An off-policy reinforcement learning algorithm on an actor-critic structure is next developed to solve the optimal containment control problem online, using relative output measurements and without requirement of knowing the leaders' dynamics by all followers. Finally, the theoretical results are verified by numerical simulations.
△ Less
Submitted 30 March, 2018;
originally announced March 2018.